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A multi-stage  approach  has  been  attractive  to  avoid  prohibitive  processing  in  auto- 
matic target  detection/recognition  (ATD/R)  of  sensor  imagery.  Multi-stage  ATD/R  sys- 
tems usually  implement  a focus  of  attention  stage  to  localize  the  regions  of  interest  where 
possible  target  candidates  are  found  and  apply  a recognition  stage  only  at  the  regions  of 
interest  selected  by  the  focus  of  attention  stage.  The  focus  of  attention  stage  rejects  most 
clutter  (uninteresting  background)  in  sensor  imagery  and  detects  targets  of  interest  at  a 
high  probability  of  detection  rate  (usually  100%  detection). 

This  dissertation  addresses  a novel  approach  to  the  design  of  a focus  of  attention  stage 
in  a multi-stage  ATD/R  system.  The  focus  of  attention  stage  consists  of  two  subsystems: 
(1)  a front-end  detection  stage  in  which  a conventional  two-parameter  constant  false  alarm 
rate  (CFAR)  detector  is  extended  to  the  gamma  CFAR  (yCFAR)  detector.  The  yCFAR 
detector  relaxes  the  constraint  of  a fixed  stencil  size  in  the  two-parameter  CFAR  stencil  by 
using  gamma  kernels;  (2)  a false  alarm  reduction  stage  implementing  a quadratic  detector 
of  the  intensity  features  estimated  with  the  gamma  kernels,  which  is  called  the  quadratic 
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gamma  detector  (QGD).  The  QGD  extends  the  two-parameter  CFAR  test  with  respect  to: 
i)  the  stencil  shape;  ii)  the  features  used  in  the  decision  function;  iii)  the  selection  of 
weights  which  are  not  a priori  chosen  but  are  found  through  optimization. 

The  QGD  is  further  extended  to  a nonlinear  adaptive  structure  (multi-layer  percep- 
tron)  which  is  denoted  by  the  NL-QGD.  The  training  strategies  of  the  NL-QGD  are  dis- 
cussed in  terms  of  detection  theory.  Several  norms  such  as  Lg,  Li  j/Lg,  cross  entropy 
function,  L2  with  removal  of  non-target  outliers  during  the  training  are  implemented  to 
train  the  NL-QGD.  The  effect  of  different  norms  is  measured  in  terms  of  receiver  operat- 
ing characteristic  (ROC)  in  a large  data  set  of  synthetic  aperture  radar  (SAR)  clutter  (about 
7 km  ) with  targets  embedded.  With  these  new  criterions,  the  NL-QGD  was  able  to  sur- 
pass the  performance  of  the  QGD. 


IX 


CHAPTER  1 


INTRODUCTION 

Automatic  Target  Detection/Recognition  (ATD/R)  is  a challenging  problem.  The  goal 
is  to  detect  and  recognize  objects  of  interest  in  a clutter  dominated  imagery  (e.g.,  a for- 
ward-looking infrared  radar,  synthetic  aperture  radar  or  laser  radar  etc.). 

Early  radar  systems  displayed  all  ineoming  information  on  a screen.  Clutter,  noise, 
and  target  amplitude  variations  were  displayed  simultaneously.  Target  detection  was  per- 
formed by  human  operators,  visually  monitoring  image  intensity  variations  in  order  to  dis- 
criminate targets  against  background  clutter  and  noise.  These  raw  data  displays  are  still 
incorporated,  in  some  sense,  into  most  major  systems.  The  objective  of  automatic  detec- 
tion processing  is  to  automatieally  deteet  targets  and  to  provide  target  reports  without 
human  intervention. 

Background  clutter,  which  usually  dominates  sensor  imagery,  may  be  divided  into 
two  clutter  types;  natural  clutter  which  describes  natural  scenery  (trees,  bushes,  grass  and 
forest  etc.)  and  cultural  clutter  which  envelops  man-made  objects  (cars,  bridges,  power 
lines,  buildings  etc.).  Background  clutter  may  not  be  typically  neither  stationary,  ergodic, 
nor  Gaussian,  especially  in  high  resolution  imagery  [55]  [87].  Target  signatures  can  vary 
depending  upon  viewing  angle  and  posture.  The  difficulty  of  the  ATD/R  problem  ascribes 
to  such  complicated  variations  of  target  signatures  and  background  clutter  in  sensor  imag- 
ery. 


1 . 1 Automatic  Target  Detection/Reeognition  Technology 
Sensor  technology  and  computing  power  have  made  great  progress  in  forming  and 
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acquiring  sensor  data.  However,  relatively  less  progress  has  been  made  in  ATD/R  algo- 
rithms. Many  ATD/R  approaches  have  been  proposed  which  include  detection  theory  [42], 
pattern  recognition  technique  [9]  [11]  [47],  neural  networks  [3]  [11]  [76]  [82]  [87]  [88], 
and  model -based  algorithms  [21]  [84]  [85]. 

Detection  theory  is  attractive  for  the  ATD/R  problem.  When  target  signatures  and 
back-ground  are  described  by  statistical  models  an  optimal  detection  can  be  theoretically 
derived,  that  is,  a required  detection  probability  can  be  determined  given  a false  alarm  rate. 
The  advantage  of  the  detection  theory  approach  is  that  target  signatures  and  background 
clutter  can  be  expressed  in  an  efficient  way  by  statistical  parameters  and  optimal  solutions 
can  be  derived.  However,  this  approach  requires  that  the  statistical  model  be  valid  and  ana- 
lytically tractable  for  target  signatures  and  background  clutter.  When  the  statistical  model 
does  not  adequately  describe  real-life  raw  data,  it  degrades  detection  performance. 

Pattern  recognition  representations  typically  involve  feature  extraction  from  targets 
and  background.  The  features  are  essential  to  the  target  recognition  process.  Distinction 
between  different  target  types  and  background  clutter  should  clearly  be  based  on  the 
extracted  features. 

While  many  such  efforts  have  been  made  to  solve  the  ATD/R  problem,  none  so  far  has 
succeeded  because  variations  in  both  target  signature  and  background  clutter  contribute  to 
the  difficulty  of  the  ATD/R  problem  [47]. 

1.2  A Multi-stage  Automatic  Target  Detection/Recognition  System 

Besides  the  significant  variation  of  target  signatures  and  background  clutter  which 
adds  to  the  difficulty  of  the  ATD/R  problem,  ATD/R  systems  usually  have  to  deal  with 
prohibitive  amounts  of  image  data.  Furthermore,  it  is  attractive  to  seek  the  construction  of 
a single  algorithm  which  exploits  all  of  the  information  of  high  resolution  imagery  and 
solve  the  ATR  problem.  The  single-algorithm  approach  is  computationally  too  expensive 
and  high  resolution  imagery  is  difficult  to  model  accurately  and  hence  is  poorly  under- 
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stood.  Due  to  the  prohibitive  amount  of  sensor  imagery  to  be  processed,  real-time  process- 
ing requirements  mandate  efficient  algorithms  and  powerful  processing  architectures.  The 
multistage  approach  becomes  an  attractive  alternative  because  it  progressively  reduces  the 
number  of  interesting  areas  of  the  image  and  narrows  down  their  consideration  in  the  fur- 
ther stages,  allowing  a recognition  algorithm  to  avoid  the  processing  of  entire  images  [3] 
[1 1]  [57]  [58]  [60]  [76]  [86].  Figure  1 shows  a conceptual  flow  of  image  data  processed  in 
multistage  ATR  systems. 


Detection 


Figure  1 Data  processing  flow  and  algorithm  complexity  of  a multi-stage  ATD/R  sys- 
tem V 


1.  The  algorithm  complexity  does  not  necessarily  increase  linearly  with  the  processing  steps. 
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The  detection  stage  can  be  thought  of  as  a data  reduction  stage.  A simple  prescreening 
algorithm  in  the  detection  stage  operates  over  the  entire  imagery  and  selects  regions  of 
interest  (ROIs)  where  all  target-like  objects  are  found,  and  passes  the  locations  of  the  ROls 
to  the  recognition  stage  for  further  consideration.  The  recognition  stage  deploys  a recogni- 
tion algorithm  only  over  the  ROIs,  rejects  non-targets  and  recognizes  the  remaining  tar- 
gets. 

The  Lincoln  Laboratory  baseline  ATR  system  [57]  consists  of  three  stages  which 
include  the  prescreening  stage  (or  front-end  detection  stage),  the  discrimination  stage,  and 
the  classification.  A two-parameter  constant-false  alarm  (CFAR)  detector  serves  as  a pre- 
screener  in  the  front-end  detection  stage  and  locates  all  possible  target-like  objects  based 
on  pixel  intensity.  The  discrimination  stage  receives  the  locations  in  which  target  candi- 
dates are  found  from  the  prescreener  and  rejects  natural  clutter  [9]  [57].  In  the  classifica- 
tion stage,  the  classifier  rejects  cultural  clutter  and  assigns  the  remaining  objects  (targets) 
to  one  of  a finite  number  of  categories. 

Figure  2 depicts  our  multi-stage  approach  to  ATD/R  problem.  The  focus  of  attention 
block  diagram  in  Figure  2a  can  be  thought  of  as  a data  reduction  stage  because  only 
regions  of  interest  in  the  input  imagery  are  passed  to  the  classification  stage.  The  focus  of 
attention  is  very  important  in  multi-stage  ATD/R  problems  in  the  sense  that  the  perfor- 
mance of  the  focus  of  attention  stage  impacts  the  global  performance  of  ATD/R  systems  in 
terms  of  detection  rates  and  processing  powers  of  the  systems. 

This  dissertation  addresses  the  focus  of  attention  for  a multi-stage  ATD/R  system 
depicted  in  Figure  2b.  We  will  consider  the  well  known  two-parameter  CFAR  detector 
[57]  in  a signal  processing  perspective.  The  two-parameter  CFAR  detector  estimates  the 
mean  and  variance  from  a locally  defined  region  in  a distance  away  from  a test  pixel  and 
performs  a thresholding  function  on  a normalized  pixel  intensity  difference  by  the  local 
variance  between  the  local  mean  and  an  estimated  target  mean  under  test.  Here,  the  two- 
parameter  CFAR  test  statistic  can  be  thought  of  as  two  moment  decompositions  by  the 
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local  operators  (two  windows)  by  the  GEAR  stencil  using  a prior  knowledge  from  a detec- 
tion theory.  A new  two-parameter  GEAR  structure  is  proposed  which  decomposes  the 
image  based  on  a gamma  kernel  basis,  which  is  called  the  7GEAR  detector.  The  7GEAR 
statistic  is  a two-moment  decomposition  which  is  a projection  of  the  image  onto  a set  of 
basis  functions  which  are  the  2D  extensions  of  the  integrands  of  the  gamma  functions. 
These  integrands  are  called  the  gamma  kernels  [19]  [64].  There  is  a free  parameter  in  this 
kernel  set  that  controls  the  region  of  support  of  the  kernels. 

Then,  the  two-parameter  GEAR  detector  will  be  further  analyzed  and  be  generalized 
to  the  Quadratic  Gamma  Detector  (QGD)  [63]  which  is  designated  to  be  used  in  the  false 
alarm  reduction  stage. 


Eigure  2 A multistage  approach  to  ATD/R  problems. 


The  construction  of  the  QGD,  inspired  by  the  two-parameter  GEAR  detector,  is 
viewed  in  a signal  processing  and  pattern  recognition  context.  The  QGD  effectively  con- 
structs a set  of  features  by  using  a feature  extractor.  The  feature  extractor  projects  image 
intensity  in  local  regions  (and  the  intensity  square)  onto  the  gamma  kernel  basis.  These 
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features  (quadratic  in  the  image  intensity)  are  then  classified  by  a linear  classifier  (the  qua- 
dratic gamma  detector  QGD)  or  a neural  network  (NL-QGD). 

Preliminary  tests  conducted  at  Lincoln  Laboratory  showed  marked  improvements  of 
the  CFAR/QGD  with  respect  to  the  conventional  CFAR  detector,  both  with  1 foot  fully 
polarimetric  SAR  data,  and  also  with  1 meter  single  polarization  SAR  [63],  Presently  this 
combination  is  used  in  the  benchmark  ATR  algorithm  suite  at  MIT/LL  and  also  as  the 
focus  of  attention  in  the  ARPA  Monitor  program. 

This  dissertation  explains  the  structure  of  the  focus  of  attention  in  detail  and  its  exten- 
sions to  neural  networks.  We  also  discuss  tests  conducted  in  our  laboratory  with  ISAR  tar- 
gets embedded  in  SAR  imagery  (the  MIT/LL  ATDS  mission  90  data  pass  5 data  set). 

1 .3  Overview  of  the  Dissertation 

Chapter  2 gives  a brief  introduction  to  Synthetic  Aperture  Radar  (SAR)  and  Inverse 
SAR  (ISAR)  image  formation  used  for  this  study.  It  discusses  the  Polarimetric  Whitening 
Filter  (PWF)  as  a preprocessor  for  SAR  data  as  well  as  a target  embedding  strategy. 

Chapter  3 discusses  a two-parameter  Constant  False  Alarm  Rate  (CFAR)  detector  as  a 
prescreener  used  for  the  front-end  detection  stage  of  the  focus  of  attention  stage.  A gamma 
CFAR  (yCFAR)  detector  is  invented  as  an  alternative  to  the  two-parameter  CFAR  detector, 
using  a set  of  gamma  kernel  functions.  In  Chapter  4,  the  QGD  is  introduced  for  the  false 
alarm  reduction  stage  and  is  extended  to  the  NL-QGD. 

The  training  strategies  for  the  NL-QGD  are  discussed  in  Chapter  5.  In  Chapter  6,  the 
results  of  experiments  measuring  the  performance  of  the  focus  of  attention  in  the  ATD/R 
system  on  real-life  imagery  (Mission  90  Pass  5 SAR  data  set)  are  discussed. 

Chapter  7 concludes  the  study  and  presents  a summary  with  recommendations  for 


future  work. 


CHAPTER  2 


SYTHETIC  APERTURE  RADAR  (SAR)  DATA  DESCRIPTION 

2,1  Introduction 

SAR  is  a coherent  system  in  that  it  retains  both  phase  and  magnitude  of  the  backscat- 
tered  signals  (echoes).  SAR  refers  to  a technique  used  to  synthesize  a very  long  antenna  by 
combining  echoed  signals  received  by  the  radar  as  it  moves  along  its  flight  track.  The  high 
resolution  is  achieved  by  synthesizing  an  extremely  long  antenna  aperture  [16].  Aperture 
refers  to  the  opening  used  to  collect  the  reflected  energy  that  is  used  to  form  an  image.  In 
the  case  of  a camera,  this  would  be  the  shutter  opening;  for  radar  it  is  the  antenna.  A syn- 
thetic aperture  can  be  therefore  constructed  by  moving  a real  aperture  or  antenna  through  a 
series  of  positions  along  the  flight  track. 

The  net  effect  is  that  a SAR  system  is  capable  of  achieving  a resolution  independent 
of  a sensor  altitude  [16]  [24].  This  characteristic  makes  SAR  an  extremely  valuable  instru- 
ment for  space  observation.  As  an  active  system,  SAR  provides  its  own  illumination  and  is 
not  dependent  on  light  from  the  sun,  thus  permitting  continuous  day/night  operation,  and 
has  the  additional  advantage  of  operating  successfully  in  all  weather  conditions  since  nei- 
ther fog  nor  precipitation  have  a significant  effect  on  microwaves,  depending  on  the  wave- 
lengths. 

There  are  three  common  SAR  imaging  modes:  spotlight,  stripmap,  and  scan.  During  a 
spotlight  mode  data  collection,  the  sensor  steers  its  beam  antenna  to  continuously  illumi- 
nate a terrain  patch  being  imaged.  In  the  stripmap  mode,  antenna  pointing  is  fixed  relative 
to  the  flight  line,  resulting  in  a moving  antenna  footprint  that  sweeps  along  a strip  of  ter- 
rain parallel  to  the  pass  of  platform  motion.  In  the  scan  mode,  the  sensor  steers  the  antenna 
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beam  to  illuminate  a strip  of  terrain  at  any  angle  to  the  path  of  motion.  The  scan  mode  is  a 
versatile  operating  mode  that  encompasses  both  the  spotlight  and  stripmap  modes  as  spe- 
cial cases.  Because  the  scan  mode  involves  additional  operation  and  processing  complex- 
ity, spotlight  and  stripmap  modes  are  the  most  common  SAR  imaging  modes.  The 
spotlight  mode  allows  the  capability  of  collecting  fine-resolution  data  from  localized 
areas.  The  stripmap  mode  is  appropriate  for  imaging  large  regions  with  coarse  resolution. 

2.2  SAR  Image  Formation 

The  high  resolution  in  radar  systems  can  be  achieved  by  a technique  called  aperture 
synthesis  [24].  This  technique  enables  much  finer  resolution  to  be  achieved  than  would  be 
possible  with  a conventional  side-looking  radar.  In  a side-looking  radar,  an  antenna  which 
is  fixed  parallel  to  the  track  directs  a radar  beam  broadside  and  downward  from  the  plat- 
form tracks  as  shown  in  Figure  3.  The  ground  area  that  one  pulse  illuminates  is  called  the 
radar’s  footprint.  The  beam  is  scanned  by  the  motion  of  the  platform  so  that  the  beam  foot- 
print is  swept  along  a swath  on  the  terrain  surface.  The  dimension  of  the  footprint  is  deter- 
mined by  the  antenna  size,  the  range,  and  the  transmitted  wavelength.  With  an  antenna 
length  L and  a transmitted  wavelength  X,  the  azimuth  width  of  a footprint  is  approximately 
XR/L  at  a range  R [24].  The  footprint  of  the  illuminated  on  the  ground  does  not  disappear 
outside  this  region  but  fades  quickly.  The  width  XR/L  specifies  the  3 dB  level  where  the 
power  of  the  footprint  is  half  the  maximum  power.  The  radar  receives  and  records  the 
backscattered  energy  from  the  swath  surface  and  generates  an  image  of  the  surface  reflec- 
tivity. The  spatial  (range  and  azimuth)  resolution  of  the  image  is  determined  by  the  pulse 
width  and  the  radar  beam  width  in  the  range  direction  [24].  While  the  pulse  width  can  be 
narrowed,  and  a finer  range  resolution  achieved,  the  length  L of  the  radar  antenna  deter- 
mines the  resolution  in  the  azimuth  direction  of  the  image;  the  longer  the  antenna  is  the 
finer  the  resolution  in  this  direction  will  be.  As  an  example,  in  order  to  achieve  a 25  meter 
azimuth  resolution  from  the  Seaset  satellite  with  X - 23.5  cm  and  R = 850  km,  the  antennal 
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length  requirement  is  S km  = (23.5  cm)(850  km)/{25  m).  This  is  obviously  a prohibitively 
large  antenna  and  not  practical  for  achieving  the  specified  azimuth  resolution.  In  conclu- 
sion, the  goal  is  to  synthesize  an  image  with  the  resolution  of  a focused  large  aperture 
antenna  system  using  the  data  returned  from  a physically  small  sized  antenna  by  using 
SAR. 


Figure  3 Imaging  geometry  of  a side-looking  aperture  radar. 


2.2.1  Range  Processing 

A real  aperture  can  achieve  the  range  resolution  by  emitting  a brief  intense  rectangu- 
lar pulse,  then  sampling  the  returned  signal  and  averaging  over  time  intervals  no  shorter 
than  the  emitted  pulses.  That  is,  the  effective  duration  and  energy  of  the  transmitted  pulse 
determines  the  range  resolution  and  maximum  range  of  a radar  system.  Shorter  duration 
pulses  allow  closely  spaced  targets  to  be  discriminated,  while  high  energy  pulses  provide 
measurable  reflections  from  targets  at  large  ranges.  In  order  to  avoid  the  difficult  and 
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expensive  development  of  hardware  to  generate  short  duration  pulses  with  energy  charac- 
teristics, increased  duration  pulses  are  coded  for  transmission  and  then  compressed  at  echo 
reception. 

A linear  frequency  modulated  (FM)  waveform  of  a finite  duration  is  often  used  for 
pulse  coding,  which  is  called  the  chirp  waveform,  and  a correlation  (matched  filter) 
receiver  is  used  for  compression  [24],  The  frequency  modulation  enables  high  range  reso- 


lution to  be  achieved  at  low  transmitter  peak  power.  Functions  of  the  form 

jln(ft+-ar) 

, or  more  generally  e , compress  into  very  sharp  auto- 


cos 


2n{ft+^^ap-) 


1 


correlations.  For  the  complex  exponential  with  phase  2k  {ft  -f-  -;^at  ) , the  first  time  deriva- 
tive of  the  phase  of  the  waveform  is  2n  {f+  at) . The  frequencies  of  the  waveform 
changes  linearly  with  a slope  of  a as  time  t increases.  The  larger  the  value  of  a the  faster 
the  frequencies  change.  Figure  4 depicts  a chirp  wave  and  its  autocorrelation  function. 


Figure  4 A chirp  waveform  and  its  autocorrelation  function. 
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The  autocorrelation  A (t)  of  the  chirp  can  be  calculated  as 


Tq  + 2T 


-j2n(ft+-ar)  -j2n{fU  + x)  +-a(t  + x)") 

^ e ^ dt 


A(x)  = J e 

To 


= e 


e 


To 

j2nx(f+a(To+-T))  [-jiaX  (T - |x|)  ] 


for  -r  < X < T 


Kax{T-  |x|) 


(1) 


The  term  T-  |x|  is  a triangle  function  weighting  the  sin{x)/x  or  sine  function.  The 
width  of  the  main  lobe  of  the  autocorrelation  function  is  approximately  HaT  and  the  half 
power  is  about  HaT.  Note  that  the  product  oT  is  the  bandwidth  of  the  chirp  over  the  pulse 


This  is  also  the  time-bandwidth  product  of  the  chirp  signal.  Thus  a high  time-bandwidth  is 
required  for  high  resolution. 

For  a pulse  shape  function,  u{t),  and  received  signal  of  the  form  r{t)  = au{t-x),  the 
receiver  that  implements  a correlation  for  complex  signals  is  given  by 


where  a is  the  target  reflectivity  from  the  range  corresponding  to  time  x.  When  the  pulse 
shape  u{t)  is  selected  such  that  its  autocorrelation  fades  quickly  as  time  lag  x increases  the 
output  y{t)  of  the  receiver  will  be  maximum  when  t equals  x and  be  small  otherwise.  This 
means  that  the  output  of  the  receiver  will  have  spikes  associated  with  time  delays  which 
correspond  to  reflecting  objects. 


duration  T.  The  gain  in  resolution,  or  pulse  compression  ratio,  is  T divided  by  HaT  or  af^. 


= oA  (t-x) 


(2) 


In  general,  if  there  are  N reflectors  in  a target  reflecting  energy,  then  there  will  be  N 
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output  spikes  from  the  correlation  receiver  with  each  one  scaled  based  on  the  reflectivity 

N 

of  the  associated  target.  If  r(t)  is  given  by  ^ o.u  {t-  x.) , then  the  output  of  the  correla- 

i=  1 

tion  receiver  is 

y (t)  = ju*  (s)  r {s  + t)  ds 
N 

= ^ O-^u*  (s)  u {s  + t - X.)  ds 

I = 1 

N 

= (3) 

1 = 1 

The  output  of  the  receiver  can  be  expressed  as  the  convolution  of  the  received  signal 
with  an  impulse  response  h(t)  as  follow, 

y (t)  = Jm*  (5)  r (5  + t) 

= Jm  (^)  r*  (-{t  - s))  ds 

= jr(s)h(t-s)ds  ^4^ 

The  impulse  response  of  the  linear  filter  which  implements  the  correlation  receiver  is 
therefore  h(t)  = u*  (-t) . This  convolution  implementation  is  descriptively  called  the 
time  reversed  filter  which  can  be  easily  implemented  in  an  existing  linear  filter  architec- 
ture by  modifying  the  impulse  response  and  the  output  is  then 

y{t)  = (5) 

This  filter  can  be  implemented  in  the  frequency  domain  as 

F(/)  = R{f)H(J)  = R(f)U*(f)  (6) 

where  Y(J) , R (f) , and  H (f)  are  the  Fourier  transform  of  y {t) , r{t),  and  h ( t) 
respectively,  f/  (/)  is  the  conjugate  of  U (J)  which  is  the  Fourier  transform  of  w (t)  . 
The  complex  conjugate  operator  in  the  frequency  domain  corresponds  to  a complex  conju- 
gate together  with  time  reversal  in  the  time  domain  [59].  The  filter  described  by  the  corre- 
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lation,  time  reversed,  and  conjugated  receivers  is  referred  to  as  the  matched  filter.  This  is 
because  the  filter  is  essentially  a reference  replica  of  the  transmitted  pulse  which  is  com- 
pared to  the  received  signal.  When  characteristics  of  the  propagation  medium  are  known, 
the  reference  waveform  is  given  the  shape  of  the  anticipated  received  signal.  The  filter  out- 
put is  a measure  of  how  precisely  the  received  signal  and  reference  match. 

2.2.2  Azimuth  processine 


The  range  resolution  in  a radar  system  was  determined  by  the  type  of  pulse  coding 
and  the  way  in  which  the  return  from  each  pulse  is  processed.  All  the  radar  systems,  con- 
ventional radars  or  SARs  resolve  targets  in  the  range  dimension  in  the  same  way.  It  is  the 
resolution  of  targets  in  the  azimuth  dimension  that  distinguishes  a SAR  from  other  radar 
systems.  The  principle  of  SAR  is  to  store  successive  echoes  to  a moving  radar  from  targets 
in  ground,  and  to  process  them  to  synthesis  a long  aperture,  thereby  achieving  high  azi- 
muth resolution  [17]. 

A radar  with  an  antenna  length  La  in  the  azimuth  direction  generates  the  radar  beam 
that  has  an  angular  spread  of  0^  = 'k/L^  in  Figure  5.  Two  point  targets  on  the  ground 
separated  by  an  amount  of  6x  in  the  azimuth  direction  and  a slant  range  R can  be  resolved 
only  if  they  are  not  both  in  the  radar  beam  and 


5x  = Rd^ 


(7) 


This  is  the  resolution  limit  of  a conventional  side-looking  real  aperture  radar.  It  is  clear 
from  this  that  the  azimuth  resolution  capability  of  the  conventional  radar  varies  inversely 
with  the  physical  antenna  size,  becoming  finer  for  increasing  antenna  length  and 
degrading  with  increased  slant  range  R. 
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Figure  5 Aperture  synthesis.  The  point  target  is  in  the  beam  for  a time  = L/V. 
After  phasing  correcting  the  signals,  a synthetic  antenna  pattern  is  obtained  which  is 
equivalent  to  that  of  a conventional  antenna  of  length  2L^. 
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In  SAR  operation,  suppose  that  we  consider  a radar  beam  to  sweep  over  an  arbitrary 
target  as  the  platform  flies  over  the  scene  in  Figure  5.  The  point  target  remains  in  the  beam 
for  a certain  time  interval  T^.  During  this  time  interval  the  radar  transmits  pulses  at  a cer- 
tain rate  (the  pulse  repetition  frequency  or  PRF)  and  also  receives  backscatter  off  of  the 
point  target  during  the  repetition  times  between  successive  pulse  transmissions.  Therefore 
after  the  time  interval  T^,  a collection  of  backscatters  are  built  up  which  span  a spatial 
interval  L^,  equal  to  the  beam  width. 

L=5x  = (8) 

The  backscatter  from  the  point  target  is  distributed  over  a large  number  of  apertures 
along  the  track’s  spatial  extent.  A large  antenna  aperture  has  the  difficulty  of  being  physi- 
cally implemented  but  is  synthesized  by  sequentially  gathering  the  backscatters  using  a 
small  sized  antenna  at  different  positions  which  collectively  define  the  antenna  array. 

The  slant  range  R from  the  radar  to  a point  target  can  be  written 


R 


= 1^0 


+ x 


1 -I- 


2RI 


if  R^  » jc 


(9) 


where  Rq  is  the  slant  range  when  perpendicular  to  the  flight  line  and  t is  the  elapsed  time 
from  when  the  platform  passed  its  closest  point  of  approach  to  the  point  target.  Hence  the 
phase  shift  of  transmitted  and  received  signals  is 


27t 

(j)  = — X 2R 


_ . 2k 


(10) 


where  X is  the  radar  wavelength,  and  (f)^  = -4kR^X  . The  Doppler  frequency  shift 
between  the  transmitted  and  received  signals  is  given  by 
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^ _ d(^{t)  _ 2v^  ^ 

~ dt  ~ xrJ 


(11) 


From  (11),  The  azimuth  spread  of  the  point  target  response  approximates  a linear  fre- 
quency modulated  waveform.  The  Doppler  frequency  shift  is  highest  when  the  point  target 
enters  the  radar  beam.  This  decreases  with  time  until  it  becomes  negative  and  reaches  a 
minimum  before  the  point  target  moves  out  of  the  area  of  illumination. 

The  azimuth  resolution  to  resolve  two  consecutive  point  targets  in  the  azimuth  direc- 
tion is  determined  by 


and 


A/  = 


2vAx 


Ax 


2v 


xA/ 


(12) 


(13) 


The  Doppler  resolution  of  the  processing  is  the  reciprocal  of  the  time  T taken  to  syn- 
thesize the  aperture,  which  is 


T = 


vL„ 


(14) 


The  maximum  azimuth  resolution  is  therefore  the  value  of  Ax  which  corresponds  to  a 
Doppler  bandwidth  of  l/T^ 


A V 

Ax  = — X 


(15) 


2v  R^X  2 

(15)  implies  that  the  azimuth  resolution  improves  as  the  antenna  length  decreases 
However,  shorter  antennas  require  more  power  for  signal  transmission  and  a longer  syn- 
thetic aperture. 


2.3  ISAR  image  formation 

Besides  the  three  SAR  imaging  modes  mentioned  earlier,  there  is  a fourth  operating 
mode  called  inverse  SAR  (ISAR).  SAR  in  this  mode  produces  radar  signal  data  similar  to 
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that  of  spotlight  mode  SAR.  However,  the  ISAR  mode  is  different  in  that  data  collection  is 
accomplished  with  the  radar  stationary  and  the  target  moving.  The  signals  are  similar 
because  it  is  the  relative  position  and  motion  between  the  sensor  and  scene  being  imaged. 
Since  the  signals  are  similar,  the  processing  required  to  produce  an  image  is  also  similar. 

Therefore,  ISAR  imaging  refers  to  the  use  of  target  motion  alone  to  generate  a syn- 
thetic aperture  for  azimuth  resolution.  ISAR  imaging  can  be  accomplished  by  rotating  the 
platform  (turntable)  on  which  a target  is  imaged.  The  ISAR  operation  is  very  useful  for 
collecting  the  RCS  (radar  cross  section)  information  of  a target  depending  on  many  differ- 
ent aspect  angles  given  different  depression  angles. 

ISAR  uses  the  same  carrier  term  as  in  the  SAR  case  but  the  difference  is  that  the 
motion  of  a target  relative  to  the  radar  platform  is  rotational  as  opposed  to  linear.  The 
resulting  Doppler  phase  term  due  to  rotation  can  be  linearized  as  a function  of  azimuth 
range  position.  Figure  6 displays  an  ISAR  imaging  scenario.  The  radar  is  stationary  as  the 
target  rotates  at  a constant  angular  rotation  rate  to  in  rad/s  on  a turntable.  As  a scatterer  at 
a point  X on  the  x-axis  is  rotated  through  an  angle  A0,  the  change  in  the  Doppler  phase  is 


The  azimuth  resolution  to  resolve  two  scatterers  on  the  x-axis  for  a small  viewing- 
angle  rotation  A0  is  obtained  for  A/^  = l/T  as 


Ar 


C 


X 

2(0 


A/n  = 


lA 

2(oT 


lA 

2A0 


(17) 


The  range  resolution  for  ISAR  is  obtained  by  using  wideband  waveforms  as  with 


SAR. 
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Figure  7 illustrates  ISAR  imaged  vehicles  from  the  TABILS  24  ISAR  data  set.  The 
radar  used  for  the  data  set  is  a fully  polarimetric,  band  radar.  In  Figure  7a  and  Figure 
7b,  the  MV0015  and  MV0095  vehicles  are  shown  at  depression  angles  of  20°  and  15° 
respectively,  each  with  aspect  angles  of  0°,  40°,  80°,  and  120°.  Target  scattering  looks 
very  different,  depending  on  the  azimuth  angles. 
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Figure  7 Examples  of  ISAR  imagery.  Down  range  is  increasing  from  left  to  right. 
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2.4  Preprocessing 

It  is  often  required  that  before  deploying  detection  and  recognition  tasks  on  the 
image,  the  image  enhancement  is  often  necessary  to  enhance  target  detection/recognition 
performance.  Image  enhancement  is  therefore  viewed  as  a preprocessing  step  before  pro- 
ceeding ATD/R  tasks. 

With  the  availability  of  fully  polarimetric  high-resolution  SAR  imagery,  several 
image  enhancement  techniques  have  been  developed,  exploiting  the  polarization  scatter- 
ing characteristics  of  targets  and  background  clutter.  SAR  processing  allows  for  high-reso- 
lution images,  but  introduces  considerable  amount  of  speckle  in  the  image  due  to  the 
coherent  nature  of  the  imaging  process.  The  primary  goal  of  preprocessing  in  polarimetric 
SAR  imagery  is  to  reduce  image  speckle  and  to  improve  target-to-clutter  contrast. 

Although  many  image  enhancement  techniques  have  been  developed,  polarimetric 
enhancement  techniques  are  particularly  desirable  over  other  enhancement  techniques 
because  they  can  provide  significant  speckle  reduction  and  target-to-clutter  ratio  improve- 
ment while  preserving  the  resolution  of  the  original  SAR  imagery. 

Novak  et  al  developed  the  polarimetric  whitening  filter  (PWF)  which  combines  pola- 
rimetric measurements  to  produce  an  intensity  image  having  minimum  speckle  [53]  [54] 
[55]  [56].  Such  an  improvement  led  to  enhance  target  detection  performance  [14]  [53], 
clutter  segmentation  ability  [10],  and  texture  discrimination  ability  [9]. The  detection  algo- 
rithms discussed  and  developed  in  this  dissertation  are  tested  and  evaluated  based  on  the 
PWF  SAR.  The  PWF  technique  developed  by  Novak  et  al.  is  introduced  in  the  following 
sections. 

2.4.1  Polarimetric  Clutter  Model 

A mathematical  model  is  used  to  characterize  fully  polarimetric  radar  returns  from 
ground  clutter.  When  operating  in  a linear  polarization  basis,  a synthetic  aperture  radar 
uses  four  polarizations  {HH,  HV,  VH  and  W)  to  measure  the  full  polarization  scattering 
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matrix  for  any  such  clutter  region.  Since  the  HV  has  a reeiprocity  relationship  with  the  VH, 
the  set  of  three  polarizations  {HH,  HV,  and  W)  contains  all  the  information  in  the  polar- 
ization scattering  matrix. 

A realistic  assumption  is  often  made  on  ground  clutter  and  sea  clutter  because  they 
are  spatially  nonhomogeneous.  Such  clutter  are  modeled  with  non-Gaussian  models.  The 
polarimetric  measurement  Y for  each  SAR  pixel  in  ground  clutter  is  expressed  by  three 
complex  elements:  HH,  HV,  and  W. 


HH 

HHi  + JHHq 

Y = 

HV 

- 

HVj+jHVq 

VV 

VV,+jVVQ 

where  HHj  and  HHq  are  the  in-phase  and  quadrature  components  of  the  complex  HH 
measurement.  Y is  assumed  to  be  the  product  of  a complex  Gaussian  vector  X and  a spa- 
tially varying  texture  variable  Jg. 

Y=  JgX  (19) 

The  vector  X is  assumed  to  be  zero-mean  (due  to  the  random  absolute  phase  of  its 
components)  and  complex  Gaussian.  Hence,  the  probability  density  function  of  the  vector 
X is  given  by 

f(X)  = -^expi-Xtl-^X)  (20) 

where  the  symbol  t represents  the  operation  of  Hermitian  transposition,  and 
Z = £■  (XXt ) is  the  polarization  covariance  matrix  of  X.  In  general,  the  polarization  cova- 
riance matrix  in  a homogeneous  region  of  clutter  takes  the  form 


Z = 
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(21) 


where 
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(22) 
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(23) 


(24) 


EiHH-VV*) 

[E(\HH\^)E{\VV\^)]^^^ 


(25) 


It  is  assumed  that  the  product  multiplier  ^ is  a gamma-distributed  random  variable. 
This  assumption  is  universal:  the  log-normal  and  Weibull  models  are  also  widely  used. 
The  gamma-distributed  random  variable  g has  the  form  of  a distribution 

1 » 1 -o 

/oU)  = |(f) 


where  the  parameters  ^and  v are  related  to  the  mean  and  the  variance  of  the  random  vari- 
able g. 


E(g)  = 


E(g^)  = fv(v+l) 


(27) 

(28) 


where  £ is  a statistical  mean  operator.  Therefore,  the  resulting  probability  density  function 
of  Y is  the  modified  Bessel  function,  or  generalized  /^-distribution  [52],  given  by 
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Note  that  when  g = - so  that  the  mean  of  the  texture  variable  is  unity  the  non- 
Gaussian  model,  in  the  limit  as  v — > °o  reduces  to  the  Gaussian  model. 

The  assumption  that  HV  is  uncorrelated  with  HH  and  W is  not  always  true  (espe- 
cially for  man-made  targets  or  for  a polarimetric  SAR  with  cross-talk  between  channels) 
but  is  valid  for  ground  clutter  [53], 

2.4.2  Polarimetric  Whitening  Filter  (PWF) 

The  mathematical  model  established  in  the  previous  section  is  now  used  for  process- 
ing the  polarimetric  measurements  HH,  HV,  and  VV  to  form  an  enhanced  SAR  intensity 
image.  An  optimal  processor  known  as  the  PWF  is  derived,  which  combines  the  polari- 
metric measurements  to  produce  an  intensity  image  having  a minimum  amount  of  speckle. 
A quadratic  processing  of  the  polarimetric  measurement  to  an  intensity  image  is  con- 
structed as  follows, 

y = Y\aY  = gX^AX  (30) 

where  the  weighting  matrix  A is  assumed  to  be  Hermitian  symmetric  and  nonnegative  def- 
inite, and  g is  a spatially  varying  texture  variable.  The  objective  is  to  find  the  optimal 
weighting  matrix  which  leads  the  quadratic  processing  of  the  polarimetric  measurement  to 
produce  an  intensity  image  with  a minimum  amount  of  speckle.  The  ratio  of  the  standard 
deviation  of  the  image  pixel  intensities  to  the  mean  of  the  image  pixel  intensities  is  used  as 
a measure  of  speckle  and  is  given  by 

5 _ standard  deviation  of  y 
m mean  of  y 

where  VAR  is  the  variance. 

Instead  of  minimizing  the  speckle  amount  s/m,  we  minimize  the  square  of  the  speckle 
amount  {s/mf. 


VAR{Y\AY) 

E^(Y\AY) 


(31) 
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5 2 VAR(YfAY)  VARigXtAX) 

(-)  = 2 L = i (32) 

E^iYtAY)  E'^igXtAX) 


We  use  the  following  useful  results: 


3 

£(XtAX)  = trCLA)  = ^ X.  (33) 

i=  1 
3 

VAT^CXt'AX)  = triTA)^  = X 

/ = 1 


where  tr  is  the  trace,  and  Xj , Ti^,  and  are  the  eigenvalues  of  the  matrix  ZA . With 
these  results,  the  square  of  the  s/m  ratio  can  be  written  as 


-VAR(g)- 
. E\g)  _ 


E(g^) 

E\g) 


l/A/?(Xt-AX)  VAR{g) 
E'^iXtAX)  E^{g) 


(35) 


Note  that  v is  a constant  in  (35),  and  minimizing  (s/m)  is  equivalent  to  minimizing 

3 

/■=  1 


Ih, 
1 = 1 ^ 


(36) 


Note  from  Eq  35  that  if  the  set  { Xj,  X2,  ^3}  yields  a minimum  for  (s/m)^,  then  so  does 

the  set  a^,,  aX,o  for  any  scalar  a.  Therefore,  we  can  minimize  (35)  by  minimizing 
^ ^ 3 

its  numerator  ^ X^  subject  to  the  arbitrary  constraints  ^ = 3 on  its  denominator. 

1 = 1 / = 1 

This  modified  optimization  problem  can  be  solved  with  the  method  of  Lagrange  multipli- 
ers. Using  a Lagrange  multiplier  (3,  we  minimize  the  unconstrained  functional 

3 

f(X^,X2,Xy^)  = j^Xl 
/•=  1 


f 

] 

(37) 
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Taking  partial  derivatives  with  respect  to  Xj,  and  setting  the  results  equal  to  zeros  yields 


^ = 2X.-2p^X.  = 0 for  i = 1,2,3 


(38) 


i=  1 


Thus  we  find  that 


(39) 


B = — i-  = — = -2_ 

3 3 3 

I \ I ^ 

1 = 1 /■  = 1 1 = 1 
3 

The  above  result  (together  with  the  condition  ^ X..  = 3 ) implies  that  a minimizing  solu- 


/ = 1 


tion  is 


Xj  = X-2  = X-3  = 1 


(40) 


(40)  leads  to  the  following  result; 


Y.A  = / 


(41) 


A = Z- 


(42) 


(40)  and  (41)  imply  that  the  optimal  weighting  matrix  A is  the  one  that  makes  all  of  the 
eigenvalues  of  ZA  equal  to  one.  The  minimum  speckle  intensity  image  is  therefore  con- 
structed as 


y = gXtZ-^X 


(43) 


This  solution  can  be  interpreted  as  a polarimetric  whitening  filter.  That  is,  the  polarimetric 
measurement  Y is  transformed  to  a new  coordinate  system  by  the  filter  Z“*  to  obtain 


W = 

This  transform  whitens  the  polarimetric  measurements,  so  that 

-1  _1 

Y.^  = E{WW\)  =gz"2£(XYr)z"2 


(44) 


1 _i 
2vv  2 


= gZ  "ZZ 
= gl 


(45) 
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The  minimum  speckle  image  is  then  obtained  by  noncoherently  averaging  the  power  in  the 
elements  of  the  whitened  vector  W,  as  shown  by 


>■=  If,!" 


/■=  1 

= wtw 


(46) 


In  conclusion,  the  PWF  changes  the  polarimetric  base  from  a linear  polarimetric  base 
to  a new  base  given  by 


^ -|p|^)  - 


(47) 


In  this  new  basis,  the  polarimetric  channels  are  uncorrelated  and  have  equal  expected 
power.  Thus  the  optimal  way  to  reduce  speckle  polarimetrically  is  to  sum  the  powers  non- 
coherently in  these  polarimetric  channels.  Another  advantage  of  the  PWF  processed  image 
is  that  it  merges  into  a single  image  of  the  scatters  that  show  up  only  in  one  of  the  polariza- 
tion channels.  Therefore,  the  PWF  image  is  very  useful  for  locating  the  features  that  might 
be  ultimately  used  in  a recognition  task.  Figure  8 summarizes  the  PWF  processing  proce- 
dure. 


2.4.3  Preprocessing  SAR  Image 

The  focus  of  attention  of  the  ATD/R  system  (Figure  2)  is  deployed  on  the  PWF  SAR 
image  whose  primary  characteristic  is  that  the  speckle  in  the  SAR  imagery  is  optimally 
reduced  [54].  Therefore,  the  input  to  preprocessing  is  a sequence  of  fully  polarimetric 
images  where  each  polarimetric  image  in  the  sequence  is  a set  of  three  complex-valued 
images  denoted  by  HH,  HV  and  VV  when  the  images  are  expressed  in  a linear  polarization 
basis.  Each  set  of  three  complex-valued  (HH,  HV,  W)  pixels  is  optimally  combined  and 
transformed  to  the  real-valued  pixel  intensity  y by  the  PWF. 
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1 . Whitening  the  original  image 


Whitening 

HH 

Filter 

HV 

^ W = 

Je 

2^-1/2 

{VV-p*JyHH) 

I 

1 

K) 

1 

2.  Compute  a minimum  speckle  image 


HH\^  + 

HV 

2 

-H 

{VV-p*JyHH) 

7i 

7v(i-ipi") 

Figure  8 Minimum-speckle  image  processing  [55], 


Each  set  of  three  complex-valued  {HH,  HV,  W)  pixels  is  optimally  combined  and 
transformed  to  the  real- valued  pixel  intensity  y by  the  PWF.  Each  pixel  intensity  y is 
related  to  the  vector  Y = (HH,  HV,  W)  of  corresponding  complex  polarization  values  HH, 
HV,  and  VY by  the  quadratic  relation  y = Yt AY  where  t denotes  the  complex  transpo- 
sition and  the  matrix  A is  determined  such  that  the  speckle  amount  is  minimum.  Finding  A 
requires  that  the  polarization  covariance  matrix  of  the  surrounding  clutter  be  found. 

The  mission  90  pass  5 SAR  imagery  used  in  this  study  is  a strip  mode,  fully  polari- 
metric  image  with  a linear-polarization  basis.  The  scrub  region  located  in  the  vicinity  of 
the  powerline  towers  in  frame  105  of  mission  90  pass  5 SAR  data  set  was  used  for  com- 
puting the  covariance  matrix  of  a typical  clutter  background  and  for  the  PWF  processing. 
The  estimated  polarimetric  covariance  matrix  estimated  is  reported  to  be  [55] 
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(48) 


Z = 0.098  • 

C 


I.OO+7O.OO 
- 0.01  -;0.02 
0.60+y0.05 


-0.01  +;0.02 
0.19+y0.00 
- 0.00  + jO.OO 


0.60  -J0.05 
- 0.00  -JO.OO 
1.08+;0.00 


2.5  SAR  Image  Visualization 

It  is  often  required  that  the  image  data  being  used  as  input  need  to  be  visualized  on  a 
display  screen  to  observe  the  characteristics  of  the  data  such  as  trees,  shadows,  buildings. 
The  visualization  of  the  data  is  also  required  in  the  target  embedding,  the  preparation  of 
training  and  testing  data  for  the  detectors,  and  the  selection  of  regions  of  interest. 

Khoros  [43]  was  used  for  such  purposes,  which  is  an  integrated  software  development 
environment  for  information  processing  and  visualization.  Khoros  worksheet  brings  up 
interactive  display  glypses  with  pan  and  zoom  capabilities  in  Figure  9.  This  worksheet 
was  created  for  a display  purpose.  The  worksheet  loads  frames  and  concatenates  them  in 
order.  Using  interactive  display  it  allows  a users  to  scroll  through  the  entire  plane  of  the 
loaded  images  and  to  zoom  in  and  out  on  the  regions  of  interest.  Reading  the  current  cur- 
sor position  provides  the  corresponding  coordinates  (x,y)  of  the  image. 

Figure  9 displays  an  example  of  visualizing  two  frames  of  the  SAR  images  in  the 
Khoros  working  environment.  One  frame  in  the  Mission  90  pass  5 SAR  data  set  has  a size 
of  2048  and  512  pixels  in  azimuth  and  cross  range.  Each  pixel  is  a 8-bit  integer  value 
which  ranges  from  0 and  255.  The  pixel  values  were  linearly  converted  from  the  PWF 
transformed  data  having  mostly  -50  dB  to  -f-30  dB. 
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Figure  9 Khoros  worksheet  for  a display  purpose. 
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2,6  Examples  of  SAR  Images 

Radar  images  are  composed  of  many  pixel  elements  (pixels).  Each  pixel  in  the  radar 
image  represents  the  radar  backscatter;  brighter  areas  represent  high  backscatter.  Bright 
features  mean  that  a large  fraction  of  a transmitted  radar  signal  was  reflected  back  to  the 
radar  receiver,  while  dark  features  imply  very  little  reflection  back  from  targets.  Backscat- 
ter for  a target  area  at  a particular  wave  length  depends  on  a variety  of  conditions:  the  size 
of  scatters  in  the  target  area,  moisture  content  of  the  target  area,  polarization  of  the  pulses 
transmitted,  the  wavelength  used  in  the  radar  transmitter,  and  observation  angles. 

Backscatter  also  depends  on  the  use  of  different  polarizations.  Since  polarimetric 
SARs  measure  the  phases  of  incoming  pulses,  the  phase  differences  (in  degrees)  in  the 
return  of  HH  and  W signals  are  frequently  the  result  of  structural  characteristics  of  the 
scatters. 

Figure  10  displays  some  SAR  imagery  preprocessed  by  the  PWF.  In  general,  the 
higher  or  brighter  the  backscatter  on  the  image,  the  rougher  the  surface  being  imaged.  Flat 
surfaces  that  reflect  little  or  no  radar  transmitted  signal  back  towards  the  radar  receiver 
will  always  appear  dark  in  radar  images. 

Figure  10a  shows  a highway  and  a bridge.  The  surfaces  of  the  highway  on  the  bridge 
appear  dark  since  they  are  flat.  Surfaces  inclined  towards  the  radar  usually  have  a stronger 
backscatter  than  surfaces  which  slope  away  from  the  radar  and  tend  to  appear  brighter  in  a 
radar  image. 

Figure  10b  displays  some  houses  and  parked  cars.  The  roof  surfaces  inclined  toward 
the  SAR  appear  much  brighter  than  the  surfaces  inclined  away  from  the  radar.  Some  areas 
not  illuminated  by  a radar,  such  as  the  back  slope  of  mountains,  appear  dark  and  are  called 
the  radar  shadow.  As  an  example  of  this,  the  houses  (in  Figure  10b)  display  radar  shadows 
at  the  back  sides.  Received  radar  pulses  that  bounced  off  of  several  objects  appear  very 
bright  (white)  in  radar  images. 

Vegetation  is  typically  rough  on  the  scale  of  most  radar  wavelengths  and  appears  light 
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grey  in  a radar  image.  Figure  10c  displays  scenes  of  powerline  towers,  trees  and  scrub. 
Two  pairs  of  powerline  towers  are  visible  in  the  upper-left  and  the  lower-right  sides  of  the 
picture.  A narrow  scrub  region  running  diagonally  through  the  picture  appears  moderately 
coarse  and  the  trees  divided  by  the  narrow  scrub  region  look  rough  so  that  the  trees  are 
almost  individually  discernible. 

Buildings  which  do  not  line  up  so  that  the  radar  pulses  are  reflected  straight  back  will 
appear  light  grey,  like  very  rough  surfaces.  Backscatter  is  also  sensitive  to  the  target’s  elec- 
trical properties,  including  water  content.  Wetter  objects  will  appear  bright,  and  drier  tar- 
gets will  appear  dark.  The  exception  to  this  is  a smooth  body  of  water,  which  reflects 
incoming  pulses  away  from  a target;  these  bodies  will  appear  dark. 

2.7  Target  Embedding  Strategy 
2.7.1  Development  of  a Target  Embedding  Method 

ATR  systems  employ  detection/recognition  algorithms  over  the  sensor  imagery.  A tar- 
get in  the  imagery  can  exhibit  an  infinite  number  of  different  shapes  depending  on  depres- 
sion and  aspect  angles  by  SAR.  In  order  to  reliably  test  the  performance  of  an  ATR 
system,  it  is  very  difficult  and  impractical  to  actually  place  targets  with  many  different 
aspect  angles  at  many  different  locations  over  a terrain  and  to  image  the  terrain. 

ISAR  operation  provides  rich  shape  information  about  a target  with  different  view 
angles. 

By  employing  an  appropriate  method  of  target  embedding,  targets  can  be  placed  at 
many  different  locations  with  many  different  view-angles.  Figure  1 1 depicts  a target 
embedding  methodology.  The  methodology  developed  for  target  embedding  proceeds  as 
follows:  (1)  appropriate  locations  for  target  embedding  are  selected  such  that  targets  are 
placed  in  the  clear  and  in  between  large  scattering  centers  in  the  regions  of  cultural  clutter, 
(2)  in  order  to  handle  the  circular-polarization-basis  ISAR  target  images  for  the  image 
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Figure  10  Some  examples  of  SAR  images  from  the  Mission  90  pass  5 SAR  data  set. 
The  images  were  sensored  at  an  altitude  of  2 km  with  a depression  angle  of  22.5° 
with  a slant  range  of  7 km.  The  HH,  HV  and  VV  returns  of  the  images  were  combined 
to  produce  a minimum  speckle  image  via  PWF  processing.  The  radar  sensor  is 
located  at  the  top  of  each  image,  looking  down  so  that  the  radar  shadows  go  down- 
ward. 
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(Mission  90  Pass  5 data  set)  in  linear-polarization-basis  target  images,  a polarization  basis 
transformation  is  applied  to  the  ISAR  target  images,  resulting  in  the  corresponding  linear- 
polarization-basis  target  images,  (3)  two  different  images  having  the  same  polarization 
basis  with  the  same  resolution  generated  at  the  same  radar  wavelength,  clutter  image  and 
target  image,  are  coherently  added  at  the  locations  selected  for  target  embedding,  (4)  the 
PWF  transformation  is  finally  applied  to  the  new  image  coherently  added  with  targets. 


35  GHz  Mission  90  Pass  5 


35  GHz  TABILS  24 


SAR  data  with  targets  embedded 


Figure  1 1 Target  Embedding  Procedure. 


The  coherent  addition  means  that,  for  example,  the  in-phase  component  {HHj)  and 
the  quadrature  component  {HHq)  of  a target  pixel  are  added  into  the  in-phase  component 
(HHj)  and  quadrature  component  {HHq)  of  the  clutter  pixel,  respectively,  at  a location 
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selected  for  target  embedding.  This  procedure  is  also  applied  to  the  other  two  components 
(HV  and  VV)  of  the  pixel. 

The  PWF  transformation  leads  the  three  complex  values  of  each  pixel  to  a single 
image  pixel  value.  Therefore,  the  PWF  transformed  image  is  useful  for  locating  the  fea- 
tures which  might  be  ultimately  useful  for  recognition.  After  target  embedding,  further 
processing  can  be  applied  to  the  image  with  targets  embedded  before  the  detection/recog- 
nition algorithms  are  employed.  For  example,  PWF  processing  can  be  applied  for  speckle 
reduction  in  the  image. 


2.7.2  Embedding  the  TABILS  24  ISAR  Target  Data  into  the  Mission  90  Pass  5 SAR  Data 
Set 

In  order  to  utilize  and  migrate  plenty  of  targets  from  the  TABILS  24  ISAR  target  data 
set  into  the  strip  mode  SAR  image,  it  is  required  that  an  appropriate  transformation  of 
polarization  bases  be  applied  to  the  ISAR  target  images  which  are  circular-polarization- 
basis  image.  Therefore,  the  circular-polarization-basis  images  (i.e.,  LL,  LR  and  RR)  of  the 
TABLIS  24  ISAR  targets  are  transformed  to  the  corresponding  linear-polarization-basis 
images  (HH,  HV,  and  W).  This  transformed  target  images  are  coherently  added  into  the 
appropriate  locations  of  the  mission  90  pass  5 SAR  imagery  selected  for  target  embedding 
which  is  later  discussed  in  detail. 

The  embedding  method  developed  for  testing  performance  with  targets  in  clutter  (Fig- 
ure 11)  combines  the  turntable  data  (TABILS  24)  with  the  SAR  data  (mission  90  pass  5). 
No  embedding  method  will  be  perfect,  but  it  is  our  belief  that  the  method  we  are  using  pre- 
serves the  gross  statistical  characteristics  of  non-occluded  targets  in  clutter  upon  which  the 
pre-detection  schemes  in  the  focus  of  attention  stage  depend.  Independent  testing  of  the 
algorithms  by  the  MIT  Lincoln  laboratory  on  real  targets  in  clutter  (SAR  data  in  which  tar- 
gets were  in  the  field  of  view  during  data  collection)  corroborated  the  results  of  the  superi- 
ority of  the  CFAR/QGD  combination  [63].  So  this  seems  to  bear  out  the  fact  that  the 
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embedding  method  used  is  sufficient  for  development  and  generation  of  preliminary 
ROCs. 

We  chose  ISAR  target  data  that  was  taken  at  approximately  the  same  depression  angle 
as  the  mission  90  pass  5 (23  degrees).  Our  assumption  is  that  ISAR  data  with  the  same  res- 
olution of  mission  90  data,  and  depression  angles  that  differ  from  the  mission  90  pass  5 
depression  angle  by  less  than  3 degrees  are  suitable  for  embedding. 

ISAR  target  images  were  collected  from  22  target  data  sources  of  TABILS  24  data 
set.  Since  some  of  22  target  data  sources  have  the  same  target  types  but  were  measured 
under  different  weather  conditions,  22  target  data  sources  consist  of  10  different  target 
types.  For  each  target  data  source,  target  images  were  extracted  at  7.5°  azimuth  increment 
over  a complete  of  azimuth.  This  resulted  in  1000  target  images  (345  training,  345  cross 
validation,  and  345  testing).  The  ISAR  target  images  within  the  training  set,  cross  valida- 
tion, and  testing  sets  were  separated  by  7.5°  azimuth  increment,  that  is,  the  target  images 
were  picked  up  at  each  increment  step  of  7.5°  and  assigned  to  the  training,  cross  valida- 
tion, and  testing  sets  in  a sequence.  The  targets  then  need  to  be  placed  in  the  clear  and  in 
between  large  scattering  centers  in  regions  of  cultural  clutter. 

In  order  to  embed  the  targets  into  the  clutter  data,  three  polarizations  of  complex 
ISAR  data  are  extracted  {HH,  HV,  W).  These  are  added  coherently  to  the  different  regions 
in  the  three  polarizations  of  complex  SAR  data.  The  PWF  transformation  is  then  applied 
to  the  new  data.  The  PWF  transformed  data  is  scaled  logarithmically  to  the  range  [0,  255], 
Since  the  points  at  which  the  ISAR  data  is  embedded  are  range  pixels  with  low  RCS 
(although  surrounding  regions  may  contain  range  cell  with  very  large  RCS  (Radar  Cross 
Section)),  the  large  scattering  centers  on  the  ISAR  targets  are  not  changed  significantly. 
These  are  the  points  upon  which  the  prescreening  algorithms  depend. 

An  example  of  the  embedding  process  is  shown  in  Figure  12.  At  the  top  of  the  figure 
is  an  example  of  an  ISAR  image  (PWF).  At  the  bottom  left  of  the  figure  is  a section  of  cul- 
tural clutter.  At  the  bottom  right  is  the  same  section  with  the  ISAR  target  embedded.  The 
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images  are  shown  after  PWF  processing,  but  of  course,  the  target  is  embedded  prior  to 
such  processing.  As  can  be  seen  the  target  signature  is  not  changed  dramatically  after 
embedding.  Figure  13  displays  some  embedded  targets  at  a variety  of  aspect  angles. 


Original  ISAR  image  chip,  no  background 


Cultural  clutter  region,  no  embedded  Cultural  clutter  region  with  target 
target  embedded 


Figure  12  Target  embedding. 
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Figure  13  Some  examples  of  the  TABLIS  24  ISAR  targets  at  variety  of  aspect  angles 
after  embedding. 


CHAPTER  3 


PRESCREENERS 
3.1  Introduction 

As  mentioned  in  Chapter  1,  the  front-end  detection  stage  aims  to  locate  potential  tar- 
gets in  the  sensor  imagery  and  allows  for  significant  reduction  of  data  processing  in  subse- 
quent stages.  Due  to  the  direct  processing  of  the  entire  imagery,  the  front-end  detection 
stage  requires  computationally  simple  and  efficient  algorithms. 

A two-parameter  CFAR  detector  has  been  used  as  a prescreener  for  the  front-end 
detection  stage  [57].  It  computes  a Mahalanobis  distance  between  a pixel  under  test  and 
its  neighbor  pixels  defined  in  a predetermined  size  of  window  (CFAR  stencil).  The  two- 
parameter  CFAR  detector  meets  the  requirements  of  algorithm  simplicity  and  efficiency. 
There  is  however  room  for  improvement  of  the  CFAR  detector  and  we  discuss  this  possi- 
bility of  improvement.  We  introduce  a novel  detector  which  is  called  the  yCFAR  detector. 
The  yCFAR  detector  extends  the  conventional  CFAR  structure  by  using  a set  of  gamma 
kernels  as  an  alternative  to  the  fixed  and  predetermined  size  of  the  CFAR  stencil. 

Before  we  discuss  the  two-parameter  CFAR  detector,  we  start  with  a one-parameter 
CFAR  detector  to  pave  the  road  for  a two-parameter  CFAR  detector. 

3.2  A One-Parameter  Constant  False  Alarm  Rate  (CFAR)  Detector 

A cell-averaging  (CA)  CFAR  detector  controls  the  detection  threshold  for  a specific 
resolution  cell  based  on  the  estimate  of  a sufficient  statistic  of  the  clutter.  The  detector  esti- 
mates the  clutter  power  in  the  cells  surrounding  a cell  under  test.  When  the  clutter  is  statis- 
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tically  homogeneous  over  the  resolution  cells,  this  detector  works  well,  otherwise,  the 
performance  degrades. 

In  order  to  implement  a CA  operation,  a stencil  can  be  used  locally  for  a particular 
location  in  the  image  (Figure  14).  A test  pixel  is  defined  at  the  center  area  (/?,)  of  the  sten- 
cil and  the  clutter  intensity  mean  is  estimated  in  a local  region  (R^)  in  a distance  away 
from  the  pixel  under  test.  We  refer  to  the  stencil  as  the  CFAR  stencil.  The  output  of  the 
CA-CFAR  operation  can  be  expressed  as 
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(49) 


where  x represents  the  intensity  pixels  and  A,  and  are  the  numbers  of  pixels  in 
and  in  the  stencil.  This  CA-CFAR  detector  can  be  called  a one-parameter  CFAR  detec- 
tor because  only  the  mean  information  is  utilized  in  the  operation  in  (49). 

The  hypothesis  testing  is  then  performed  over  the  outputs  after  the  CA-CFAR  pro- 
cessing such  that 


target 

y ^ ^CFAR 
non  - target 


(50) 


From  a signal  processing  perspective,  the  one-parameter  CFAR  detector  in  (49)  is  a 
band  pass  filter  running  over  the  image  in  which  the  outputs  of  a low  pass  filter  by  the  tar- 
get mask  of  the  template  is  subtracted  from  the  outputs  of  another  low  pass  filter  by  the 
clutter  mask.  Abrupt  changes  in  image  intensities  produce  large  outputs  by  the  one-param- 
eter CFAR  detector.  Since  man-made  objects  are  usually  strongly  scattered  back  to  milli- 
meter SARs,  the  large  intensity  differences  between  pixels  under  test  and  their  clutter 
background  are  easily  detected. 

However,  an  output  of  the  one-parameter  CFAR  processing  in  (49)  is  the  result  of 
convolving  the  image  with  two  rectangular-shaped  kernels  which  cause  large  losses  or  rip- 
ples outside  their  main  lobes  in  the  frequency  domain.  Contrast  enhancement  between  tar- 
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gets  and  clutter  can  be  achieved  by  incorporating  smooth-shaped  kernels  in  the  stencil 
[26], 

The  threshold  of  the  one-parameter  CFAR  detector  in  (49)  is  sensitive  to  the  scale  of 
the  image  in  which  the  threshold  is  linearly  proportional  to  the  scale  factor  of  the  image. 
In  the  next  section,  a two-parameter  CFAR  detector  is  introduced  which  incorporates  a 
normalization  factor.  This  factor  is  a standard  deviation  of  the  clutter  in  R^.. 

3.3  A Two-Parameter  CFAR  Detector 

A two-parameter  CFAR  detector  was  first  developed  for  use  on  1-D  range  profile  by 
Goldstein  [29].  Later,  Novak  et  al.  extended  it  for  use  in  SAR  imagery  [56]  [57]. 

The  two-parameter  CFAR  detector  has  been  commonly  used  as  a pre-screener  in 
multi-stage  SAR  ATD/R  systems.  Its  popularity  is  due  to  an  excellent  figure  of  merit  in 
terms  of  performance/simplicity.  The  name  indicates  that  a constant  false  alarm  probabil- 
ity of  detection  is  achieved,  but  in  fact  this  is  only  true  for  Gaussian  distributed  targets  and 
clutter  [57].  Normally  in  SAR  imagery  this  is  not  the  case  for  targets  in  clutter.  Neverthe- 
less, experience  has  shown  that  the  two  parameter  CFAR  is  a robust  detector  for  man- 
made clutter  and  targets  [57]. 

The  reasons  for  this  success  can  be  found  in  the  simplicity  and  the  discriminating 
power  of  the  test.  Basically,  the  CFAR  compares  the  intensity  of  a pixel  under  test  with  the 
normalized  intensity  of  a surrounding  area.  Since  man-made  objects  are  normally  bright  in 
SAR  imagery,  this  is  a very  effective  test,  which  can  be  efficiently  implemented  in  digital 
hardware.  In  terms  of  statistical  detection  theory,  the  CFAR  is  estimating  the  parameters  of 
the  local  intensity  probability  density  function  (pdf),  and  making  a decision  when  there  is 
a brightness  deviation  of  the  pixel  under  test  with  respect  to  the  normalized  mean  back- 
ground intensity,  i.e. 
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Figure  14  CFAR  stencil.  The  amplitude  of  the  test  pixel  is  compared  with  the  mean 
and  the  standard  deviation  of  the  local  area.  The  guard  area  ensures  that  no  target  pix- 
els are  included  in  the  measurement  of  the  local  statistics  [57], 
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(51) 


where  is  a pixel  under  test,  is  a threshold  for  the  two-parameter  CFAR  detector 

and  and  are  the  estimates  of  the  local  statistics  of  mean  and  standard  deviation 
measured  in  a defined  local  area  by  the  CFAR  stencil  (see  Figure  14).  The  estimates,  X^ 
and  are  computed  by 


(52) 


where  x(ij)  is  the  pixel  intensities  at  (ij)  locations  and 


^C=  ^ Z (Xiij)-xy  (53) 

V c,-,ye 

where  defines  the  local  area  where  X^  and  are  computed,  and  N^.  is  the  number  of 
pixels  in  Q^. 

The  shape  of  the  stencil  ensures  that  when  the  center  pixel  is  on  target,  the  neighbor- 
hood falls  in  the  background  such  that  its  local  statistics  can  be  reasonably  well  estimated. 
The  shape  of  the  stencil  (in  particular  the  guardband)  is  governed  by  the  target  size  [54].  In 
SAR  imagery,  the  reflectivity  of  the  object  is  only  weakly  coupled  to  its  geometric  shape, 
so  a priori  stencil  dimensions  based  solely  on  target  size  cause  suboptimal  performance. 

In  terms  of  statistical  pattern  recognition,  we  can  interpret  the  CFAR  in  a slightly  dif- 
ferent way.  One  can  think  that  the  CFAR  stencil  is  extracting  intensity  features  in  the 
neighborhood  of  the  pixel  under  test.  In  fact,  the  CFAR  equation  can  be  rewritten  as 


X^  '2-X^X^  + X^  ^CFAR^c  ^CFAR^c 
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where  X^  and  X^(= 


I - 

ije  n 


(i,j) ) are  the  square  of  an  estimated  mean  and  an  esti- 
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mated  mean  of  intensity  power  respectively,  both  of  which  are  measured  in  a neighboring 
area  selected  by  the  CFAR  stencil. 

Notice  that  this  expression  is  a linear  combination  (with  fixed  weighting)  of  the  image 
intensity  and  its  square  at  the  pixel  and  the  mean,  power,  and  mean  square  of  the  intensity 
at  the  neighborhood.  Hence  we  can  interpret  the  CFAR  as  implementing  a “restricted”  lin- 
ear discriminant  function  of  quadratic  terms  of  the  image  intensity. 

From  this  perspective,  the  two  parameter  CFAR  can  be  improved:  first,  it  uses  only 
some  of  the  quadratic  terms  of  the  intensity  on  the  pixel  and  its  surroundings;  second,  it 
implements  a fixed  parametric  combination  of  these  features;  and  thirdly,  there  is  little 
flexibility  in  the  feature  extraction  because  the  kernel  is  ad-hoc.  These  three  aspects  can  be 
greatly  improved  if  more  mathematically  oriented  features  are  computed  and  if  trainable 
classifiers  are  built.  In  Chapter  4,  this  is  exactly  what  the  quadratic  gamma  detector  (QGD) 
and  even  more  the  NL-QGD  (Nonlinear  Extension  to  QGD)  will  provide. 

3.4  Extension  to  the  Two-Parameter  CFAR  Detector 

3.4.1  Introduction 

In  this  section,  the  conventional  CFAR  detector  discussed  in  the  previous  section  is 
improved,  incorporating  a new  stencil  which  is  called  the  yCFAR  (gamma  CFAR)  stencil. 
In  Section  3.4.2,  the  gamma  filter  and  gamma  kernel  [19]  [64]  are  reviewed.  They  consti- 
tute the  basis  of  the  new  stencil.  The  2-D  extension  of  1 -D  gamma  kernels  are  introduced 
in  Section  3.4.3. 

As  extensions  to  the  conventional  CFAR  detector,  yCFAR  detectors  are  introduced 
and  their  characteristics  are  discussed  in  Section  3.4.4. 

3.4.2  Gamma  Filter  and  Gamma  Kernel 

The  gamma  kernel  was  originally  developed  for  time  series  analysis  as  a short-term 
memory  network  structure  for  sequence  processing  neural  network  [19].  With  the  property 
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of  completeness  in  Hilbert  space  (L  space),  any  finite  energy  signals  can  be  approximated 
arbitrary  closely  using  a finite  number  of  gamma  kernels.  The  gamma  kernels  are  defined 
as 


The  implementation  of  the  gamma  kernels  is  accomplished  by  introducing  a local 
feedback  loop  around  the  delay  operator  in  Figure  15.  The  impulse  response  from  the 
input  to  the  /Ih  tap  generates  the  /rth  order  gamma  kernel  by  (55).  This  structure  is  called 
the  gamma  filter.  The  Laplace  transform  of  the  gamma  kernels  is  given  as 

N * . 

Gi(^)  = (j^)  =G*(^)  (56) 

where 


G{s) 


5 + p 


(57) 


The  /cth  order  gamma  filter  is  characterized  by  a pole  at  s = -p  with  multiplicity  k.  The 
location  of  zeros  are  determined  by  the  weights  wj,  W2,...,  and  the  parameter  p.  The 
discrete  time  gamma  filter  is  depicted  in  Figure  15b.  The  discrete  time  gamma  kernels  and 
their  corresponding  z-transform  are  given  by 


gk(n) 


n - 1 

U-  u 


p*(i  -p) 


n-k 


for  n>  k 


(58) 


G,(z)  = 


r ^ 

_Z-  (1-fi). 


= G^(Z) 


(59) 


The  gamma  filters  in  both  continuous  and  discrete  time  are  locally  recurrent,  globally 


feedforward  structures. 
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Figure  15  The  gamma  filter  structures  a)  in  continuous  time  and  b)  in  discrete  time. 
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a)  Effect  of  changing  kernel  order  (k) 


t 


Figure  16  Shape  of  gamma  kernels  effected  by  parameter  and  kernel  order  (p  and  k). 
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The  discrete  gamma  filter  generalizes  a tapped  delay  line  structure.  For  the  case  of  p = 
1,  the  gamma  filter  reduces  to  a transversal  filter  (or  a finite  impulse  response  (FIR)  filter). 
The  stability  of  the  continuous  time  gamma  filter  is  guaranteed  because  the  system  poles 
always  exist  in  the  left  half  plane  of  Laplace  domain  as  long  as  p > 0 is  guaranteed.  For  the 
discrete  case,  the  poles  are  located  inside  the  unit  circle  in  Z-domain  as  long  as  0 < p < 2 is 
guaranteed.  So  the  stability  problem  in  the  discrete  time  is  also  trivial.  The  interesting 
point  is  that  the  functions  constitute  a non-orthogonal  basis  that  is  complete  for  signals  of 
finite  energy  (in  L ) [19].  Hence  we  can  interpret  the  gamma  filter  output  as  a projection 
of  the  input  signal  in  a linear  space  defined  by  the  convolution  of  the  input  with  the  basis 
functions  [20]. 

Figure  16  illustrates,  in  continuous  time,  an  example  of  gamma  kernel  shapes  as  func- 
tions of  the  kernel  order  k and  parameter  p.  Note  that  the  shape  of  the  kernels  is  very  sim- 
ilar when  one  selects  the  order  for  a fixed  p,  or  when  p is  selected  for  a given  kernel  order. 

The  main  characteristic  of  the  gamma  filter  is  that  time  in  the  filter  taps  is  scaled  by 
the  feedback  parameter  p (linear  time  warping).  In  other  words,  the  region  of  support  of 
the  impulse  response  is  controlled  by  the  single  parameter  p such  that  by  changing  p the 
impulse  response  can  be  stretched  out  or  shrunk  as  a rubber  band  (Figure  16b).  In  a signal 
processing  framework,  the  parameter  p can  be  adapted  with  the  output  mean  square  error 
using  a gradient  descent  approach,  so  that  the  best  local  features  are  captured  by  the  filter 
[64]. 

The  linear  filters  are  often  characterized  in  the  frequency  domain  in  terms  of  their 
magnitudes  and  phase  responses.  For  the  temporal  processing  of  signals,  linear  filters  such 
as  the  FIR,  HR  and  gamma  filter  can  be  understood  as  memory  filters  which  are  character- 
ized by  the  memory  depth  and  memory  resolution  of  the  filters.  The  memory  depth  D is 
defined  as  the  temporal  mean  value  of  the  impulse  response  g(t)  of  a filter  used  for  storing 
temporal  information  [19]. 
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for  continuous  time 


(60) 
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(61) 


The  memory  resolution  R is  defined  as  the  number  of  degrees  of  freedom  (i.e.  mem- 
ory state  variables)  per  unit  time.  As  a memory  filter,  the  impulse  response  of  a kth  order 
FIR  filter  at  each  tap  is  g^(n)  = 5 (n -/:).  The  transfer  function  in  the  Z-domain  is 


depth  of  an  FIR  filter  is  equal  to  the  filter  order.  Due  to  the  limitation  of  memory  depth  by 
the  filter,  a low  order  FIR  filter  has  a poor  modeling  capability  of  low-pass  frequency 
bounded  signals  [64].  On  the  other  hand,  the  structure  of  an  HR  filter  contains  feedback 
connections,  and  consequently  the  memory  depth  is  not  limited  by  the  filter  order.  The  HR 
filters,  however,  suffer  from  stability  problems. 

The  memory  depth  of  a /ah  order  gamma  filter  for  both  discrete  and  continuous  time 
domains  is  obtained  by  (61) 


Contrary  to  an  FIR  filter,  a gamma  filter  has  a memory  depth  uncoupled  to  the  filter 
order.  The  gamma  filter  shares  the  property  of  HR  filters  in  memory  depth  due  to  locally 
introduced  feedback  between  taps.  This  also  relaxes  the  stability  problem  as  long  as  the 
local  feed  back  parameter  is  limited  in  the  range  of  0 < p < 2 for  discrete  case. 

The  memory  resolution  of  a /ah  order  discrete  time  gamma  filter  is  equivalent  to  the 
number  of  taps  divided  by  the  memory  depth. 


given  by  G{z)  = z ^ . The  memory  depth  D of  the  filter  is  D = K.  That  is,  the  memory 


(63) 


(63)  can  be  written  as 


k = DxR 


(64) 
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For  a given  order  k,  the  trade-off  between  memory  depth  and  memory  resolution  is 
possible  so  that  a very  deep  memory  structure  can  be  obtained  at  the  expense  of  a low 
memory  resolution.  In  an  FIR  case,  the  memory  depth  is  the  filter  order  and  this  is  a spe- 
cial case  of  |i  = 1 in  the  gamma  filter. 

3.4.3  2-D  Extension  of  1-D  Gamma  Kernels 

Theoretically  the  gamma  filter  can  be  extended  to  N dimensions  without  problems. 
This  is  done  by  simply  substituting  t in  (55)  with  an  N dimensional  basis  vector.  In  this 
extension,  circularly  symmetric  gamma  kernels  are  obtained  but  more  general  cases  can 
be  considered.  Specifically  for  our  applications  to  2-D  (image)  data,  the  gamma  kernels 
can  be  extended  by 


where  the  constant  C is  a normalization  factor.  The  resulting  2-D  kernel  has  a circularly 
symmetric  shape  given  by 


where  Q is  the  region  of  support  of  the  kernel,  k the  kernel  order,  and  |i  the  parameter  that 
controls  the  shape  and  scale  of  the  kernel. 

As  a result  the  resulting  2-D  gamma  kernels  are  circularly  symmetric.  Since  the  2-D 
circularly  symmetric  gamma  kernels  are  created  from  the  corresponding  1-D  gamma  ker- 
nels in  the  spatial  domain,  they  preserve  the  spatial  characteristics  of  the  1-D  gamma  ker- 
nels. That  is,  the  concept  of  a time  warping  parameter  extrapolates  to  the  spatial  domain  as 
a scale  parameter  that  controls  the  region  of  support  of  the  2-D  gamma  kernels. 


^(0 


(65) 


(66) 


Q = { (A:,  /)  ;-A<yt,/<A} 


If  the  2-D  gamma  kernel  (in  (66))  defined  in  the  Cartesian  coordinate  system  is  con- 
verted into  the  circular  coordinate  system,  the  memory  depth  of  the  2-D  gamma  kernel  can 
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be  easily  obtained.  By  letting  k = rcosG,  I = rsinG  and  considering 
dk  = rdr,  dl  = dQ,  (66)  in  the  circular  coordinated  system  is  given  by 


(67) 


The  memory  depth  in  2-D  spatial  domain  is  expressed  as 


£>  = J d)drdd 


n 


(68) 


0 0 


(68)  is  equivalent  to  the  1-D  case.  This  means  that  the  spatial  characteristics  of  1-D 
gamma  kernels  is  preserved  in  a 2-D  domain  by  the  circularly  symmetric  rotation. 

Figure  17  shows  the  characteristics  of  2-D  gamma  kernels  in  the  spatial  domain.  The 
1st  order  (n=  1)  kernel  has  its  peak  at  the  pivot  point  (0,0)  with  an  exponentially  decaying 
amplitude.  All  the  other  kernels  have  a peak  at  the  radius  n/p,  creating  concentric  smooth 
rings  around  the  pivot  point  (Figure  17).  For  a fixed  order  (n  = 15)  the  radial  distance 
where  the  kernel  peaks  is  still  dependent  upon  the  parameter  p,  as  in  the  1-D  case. 

3.4.4  Gamma  CFAR  (tCFAR)  Detector 

By  analogy  to  the  CFAR  stencil,  any  combination  of  the  first  kernel  with  one  of  the 
higher  order  kernels  produces  a similar  stencil,  although  the  shapes  with  the  2-D  gamma 
kernels  are  smoother.  We  call  this  stencil  the  yCFAR. 

It  is  interesting  to  contrast  the  7CFAR  detector  with  the  CFAR  detector.  Target  mask- 
ing in  the  CFAR  stencil  is  implemented  by  the  first  order  2-D  gamma  kernel  so  that  target 
intensity  mean  is  estimated  closely  around  the  center  pixel  under  test,  while  a higher  order 
kernel  creates  a clutter  masking  so  that  the  statistics  of  clutter  are  measured  in  the  round- 
shaped ring.  With  the  yCFAR,  we  have  a better  handle  on  the  shape  of  the  kernel  due  to  its 
analytic  formalization.  Figure  18  shows  the  yCFAR  stencil  for  CFAR  test. 
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n = 1,  |i  = 0.357 


n=  15,  )i  = 0.776 


Figure  17  2-D  gamma  kernels. 
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Since  the  parameter  |i  in  2-D  has  exactly  the  same  function  as  the  1-D  counterpart, 
i.e.,  it  shrinks  or  stretches  the  region  of  support  of  the  image  response,  we  can  adaptively 
select  p.  to  better  perform  CFAR  test. 

In  fact,  after  fixing  the  order  of  the  kernel,  we  have  a single  parameter  that  controls  its 
spatial  extent,  and  we  can  derive  equations  that  will  change  the  parameter  p to  minimize 
an  output  error.  Figure  19  shows  the  block  diagram  of  a one-parameter  yCFAR  detector. 
Two  gamma  kernels  are  linearly  combined  to  form  an  output  y of  the  one-parameter  CFAR 
detector.  We  call  this  the  one-parameter  yCFAR  detector  a counterpart  for  the  one  -param- 
eter CFAR  detector  in  (49).  The  output  can  be  written  as 

y = ^69) 

where  • represents  a convolution  operator,  wj  and  W2  are  weights,  the  kernel  order  m = 1 
and  and  p„  are  the  parameters  that  control  the  extent  of  the  kernel.  It  is  expected 

that  W2  needs  to  be  negative  so  that  the  output  will  be  high  over  areas  of  largest  contrast  as 
is  the  case  in  the  one-parameter  CFAR  detector. 

In  addition  to  the  degrees  of  freedom  associated  with  the  controllability  of  p for  the 
kernel  extent,  the  smooth  shape  of  the  gamma  kernels  yields  smaller  sidelobes  than  those 
associated  with  the  CFAR  stencil.  From  a signal  processing  perspective,  as  is  the  case  in 
the  one-parameter  CFAR  detector,  the  one-parameter  yCFAR  detector  performs  bandpass 
filtering  but  has  an  ability  of  correctly  choosing  the  frequency  bands  of  the  bright  pixel 
intensities  in  the  image  by  adapting  the  region  of  support  of  the  kernels,  and  the  energy  of 
pixel  intensities  can  be  better  preserved  in  the  selected  frequency  band  with  less  loss  due 
to  the  smoothness  of  gamma  kernels. 

Interpreted  from  the  projection  point  of  view,  the  input  signal  (an  image  in  the  2-D 
case)  is  still  being  projected  onto  a local  basis  obtained  by  convolving  the  input  with  the 
first  kernel  (m  = 1)  and  a higher  order  kernel  (n  > 1 ). 
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Figure  18  The  7CFAR  detector:  a)  the  center  kernel  has  an  order  of  1 and  the  rounding 
kernel  is  of  an  order  15;  b)  the  rounding  kernel  defines  a local  area  where  the  local  sta- 
tistics of  mean  and  standard  deviation  are  measured.  The  peaky  kernel  averages  a pixel 
under  test  and  the  very  closely  neighbored  pixels  around  a pixel  under  test. 
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Due  to  the  exponential  amplitude  decay  from  the  pivot  point,  the  spatial  extent  of  the  ker- 
nel can  be  truncated  to  a small  value.  This  projection  on  gamma  kernels  of  order  m and  n 
will  not  be  complete,  meaning  that  the  input  image  can  not  be  recovered  from  the  projec- 
tion. But  as  the  CFAR  experience  shows,  it  will  still  preserve  the  important  features  to  dis- 
tinguish man-made  objects  from  clutter. 

The  decision  rule  in  the  yCFAR  detector  is  defined  as 

e •X-e  »X  target 

*/?i,  |i  U 

a < hcFAR  (70) 

clutter 
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where  • represents  the  inner  product  operator,  is  a threshold  for  determining  target 

or  clutter,  and 


^ <7I) 

As  shown  in  (70)  and  (71),  the  local  mean  and  standard  deviation  are  computed  by  a 
higher  order  kernel  {n  > 1)  and  the  test  pixel  under  test  is  averaged  with  its  adjacent  pixels 
less  emphasized  away  from  the  pivot  point.  The  7CFAR  decision  can  also  be  recast  as  a 
discriminating  function 

U„*2r)  + («„-x)2 

~^yCFAR^S„*  ^ ^ 0 ^ ^ 

It  is  instructive  to  analyze  experimentally  the  dependency  of  the  accuracy  of  the 
CFAR  test  as  a function  of  the  size  of  the  stencil,  now  that  we  have  a parametric  form  for 
the  stencil. 

The  local  image  features  computed  by  the  convolution  can  be  specified  as  projections 
of  intensity  and  power  onto  gamma  kernels. 


i,j  e SI 

~ (^>7) 

i,j  e SI 


(73) 

(74) 


Note  that  these  operations  can  be  interpreted  as  FIR  filtering  with  gamma  stencil 
8n,  4(^7)-  output  values  can  be  viewed  as  estimates  of  local  1st  and  2nd  moments 
respectively,  which  are  sufficient  to  compute  the  local  variance.  The  local  standard  devia- 
tion in  (53)  is  needed  to  compute  the  two-parameter  CFAR  statistics,  and  has  been  shown 
to  be  an  important  measure  in  deciding  whether  a target  is  present  or  not  [57]. 

The  tCFAR  detector  separates  targets  from  clutter  based  on  the  values  of  4 features 
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8m  * which  are  needed  to  compute  local 
mean  and  standard  deviation.  These  terms  enable  a one-to-one  comparison  of  the  7CFAR 
detector  and  the  two-parameter  CFAR  detector,  and  are  also  used  to  study  the  benefits  of 
using  adaptive  feature  extraction  and  adaptive  weights  in  Chapter  4. 

In  conclusion,  the  potential  advantage  of  the  7CFAR  detector  implementation  is  that 
the  extent  of  the  localized  mean  and  surrounding  areas  can  be  adaptively  set.  This  could 
lead  to  fine  tuning  the  area  where  local  statistics  are  measured.  The  7CFAR  detector  can  be 
a promising  device  based  on  2-D  gamma  filters  as  an  extension  of  the  conventional  CFAR 
detector. 

3.4.5  Implementation  of  the  yCFAR  Detector 

Computing  the  feature  set  at  each  pixel  of  the  image  amounts  to  correlating  each  of 
the  two  kernels  ^ and  ^ with  both  the  original  image  and  the  image  squared.  Each 
kernel  assumes  a role  of  an  FIR  filter  with  rectangular  support  (size  {N  1)  x {N  + 1)). 
The  three  base  features  ( ^ • X 8„  and  ^ • X ) at  a point  (/,_/)  in  the  image 
are  then  obtained  using  a translated  gamma  kernels  as 

+ ('>))  p = ^ or  2 

k / 

N N 

where  n stands  for  the  kernel  order  and  p indicates  either  the  first  or  the  second  moment. 
Correlations  are  computed  in  the  frequency  domain  using  FFTs  to  obtain  better  computa- 
tional efficiency.  For  processing  large  images  and  to  avoid  memory  problems  we  divide 
the  image  into  overlapping  radix  2 windows  which  are  individually  processed  and  com- 
bined using  an  overlap  and  save  method  [59]. 
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3.5  Receiver  Operating  Characteristic  (ROC) 

A plot  of  detection  probability  as  a function  of  false  alarm  probability  is  referred 

to  as  receiver  operating  characteristic  (ROC)  curve.  The  ROC  is  a very  important  assess- 
ment for  detection  systems  and  depends  on  the  probability  density  function  of  a observed 
signal  under  each  hypothesis  Hj,  that  is,  py|  ^ (y|  Hj),j  = 0,  1 . Figure  20b  depicts  an  ROC 
curve  with  the  probability  functions  for  two  hypotheses.  In  Figure  20a,  varying  a threshold 
K controls  the  area  representing  P^j  and  Pf.  The  corresponding  ROC  curve  is  shown  in  Fig- 
ure 20b. 


Figure  20  ROC:  a)  probability  density  functions,  b)  ROC  curves. 


CHAPTER  4 


QUADRATIC  GAMMA  DETECTOR 
4,1  Introduction 

The  goal  of  a prescreener  in  the  front-end  detection  stage  was  to  locate  potential  tar- 
gets and  to  eliminate  a large  amount  of  clutter  in  the  sensor  imagery.  The  output  of  the  pre- 
screener are  regions  of  interest  which  need  be  further  considered  in  the  following  stage, 
false  alarm  reduction  stage.  The  regions  of  interest  are  indicated  by  the  coordinates  U,y)  in 
the  image  and  the  further  processing  is  applied  to  the  locations  reported  by  the  pre- 
screener. 

Section  4.2  briefly  reviews  the  discriminant  functions.  Section  4.3  develops  a novel 
detector  (QGD)  [63],  which  is  extended  from  the  two-parameter  GEAR  and  7CFAR  detec- 
tors, and  discusses  its  discrimination  power. 

In  Section  4.5  a multilayer  perceptron  (MLP)  structure  is  reviewed  and  its  training 
and  the  generalization  ability  of  the  network  is  also  discussed.  Finally  the  QGD  is 
extended  into  a nonlinear  structure  such  MLPs,  which  is  called  the  NL-QGD  (Nonlinear 
extension  to  the  QGD)  in  Section  4.6. 

4.2  Discriminant  Functions 
4,2.1  Linear  Discriminant  Functions 

In  general,  a discriminant  function  is  an  operator  that,  when  applied  to  a pattern, 
yields  decisions  concerning  the  class  membership  of  the  pattern.  The  action  of  a discrimi- 
nant function  is  to  produce  a mapping  from  pattern  space  to  attribute  space. 
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A general  linear  discriminant  function  in  n-dimensional  pattern  space  is  of  the  form 


^(X)  =w^x^  + W2X2  + + 


= ly'x+w 


n+  1 


(76) 


where  W is  called  the  weight  vector  and  the  threshold  weight.  The  decision  rule  of  a 
two-category  classifier  is  implemented  by  the  following  property 

g(X)  = VT"'X+w„^, 

>0  if  X e (77) 

<0  if  X e Cd2 

The  input  pattern  vector  X is  categorized  into  the  class  ©|  if  g(X)  > 0 and  into  if 
g(X)  < 0.  For  a multicategory  case,  linear  discriminant  functions  implement  the  decision 
rule:  assign  X to  co.  if  g- (X)  > gj  (X)  for  all  j ^ i where  g- (X)  = W^X  -t-  In  case 

of  ties,  decision  is  left  undefined. 


4.2.2  Generalized  Linear  Discriminant  Functions 

The  linear  discriminant  function  discussed  in  the  previous  section  can  be  extended  to 
the  generalized  linear  discriminant  function  of  the  form 

g (X)  = w^f^  (X)  + w/2  (X)  -I-  ...  -I-  wj  (X)  + w ^ 

^ (78: 

= W^F 

where  F=  [/j  (X)  /2(X)  ...  /„  (jc)  1 ] ^ are  a real  valued  vector  with  function  ele- 
ments of  the  pattern  X,  and  VF  is  a n-dimensional  weight  vector.  Any  desired  discriminant 
function  can  be  approximated  by  a series  expansion  {f-  (X)  } by  selecting  these  functions 
judiciously  and  letting  n be  sufficiently  large. 

The  discriminant  function  in  (78)  is  not  linear  in  the  original  input  pattern  space,  but 
linear  in  the  transformed  pattern  (F)  space.  The  pattern  F is  therefore  separated  in  the 
transformed  space  by  a hyperplane  but  the  pattern  X by  the  hypersurface  in  the  original 
pattern  space.  Therefore,  the  mapping  from  X to  F reduces  the  problem  to  one  of  finding  a 
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linear  discriminant  function. 


4.3  Extension  to  the  tCFAR  Detector 


Quadratic  discriminant  functions  implement  the  optimal  classifier  for  Gaussian  dis- 
tributed classes  [8]  [27].  A quadratic  discriminant  function  in  d dimensional  space  is 


d d - \ d d 

g(X)  = J^Wjjxj+ X Z + 

j=l  j=l  k=j+l  y=l 

where  w is  a set  of  adjustable  parameters.  Probably  the  most  common  way  to  construct  a 
quadratic  classifier  is  to  create  a quadratic  processor  that  creates  all  the  terms  of  ^(X),  fol- 
lowed by  a linear  machine  which  simply  weighs  each  one  of  these  terms. 

Figure  21  depicts  the  implementation  of  the  quadratic  gamma  detector  (QGD)  that 
follows  this  methodology.  The  QGD  extends  the  yCFAR  detector  as  a generalized  form  by 
exploiting  all  quadratic  and  linear  terms  of  the  two  intensity  features  • X and  g^  • X 
in  (70). 

Notice  that  in  addition  to  4 terms  of  the  quadratic  form,  we  included  3 more  terms 
g^  • X^,  • X,  and  g„*  X for  a direct  analogy  with  the  CFAR  detector.  In  fact,  our  two 

input  features  are  the  intensities  at  the  pixel  under  test  • X and  the  intensity  in  the  ring 
neighborhood  g„*  X.  From  these  quantities  the  traditional  quadratic  detector  creates 

(Sm  • (Sn  • 7^)  Sm  * 8n*  (Sm  * (Sn  * ^ direct  Com- 

parison with  the  CFAR  we  will  add  g^^  • X^  and  g,,  • X^  for  a total  of  8 features.  The  fea- 
ture vector  reads 


, |l  ^8ni,  (i 


8n,  I 


• X 


’m,  u 


g »X 


(80) 


and  the  quadratic  detector  becomes 
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target 

3"  $ T 


Figure  21  Quadratic  Gamma  Detector  (QGD). 
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y = W^F 


n 


= ^1  (8m,  n/  ^)  + (8n,  H„  • ^)  + (Sm,  n„  * (<?n,  n, 

’ “m  ’ “n  * “m  ’ “n 

>7  (target) 

< T (clutter) 


•X^) 

• X)  + Wjj 


(81) 


where 

T 

w = [ Wj  Wj  Wj  W4  Wj  W7  Wg]  (82) 

With  this  formulation,  we  can  understand  a little  better  what  was  said  previously 
regarding  the  restricted  nature  of  the  two  parameter  CFAR  detector  when  seen  from  a pat- 
tern recognition  stand  point.  From  pattern  recognition  theory  we  know  that  any  quadratic 
or  higher  order  polynomial  decision  function  can  be  implemented  with  a linear  decision 
function  if  the  feature  vector  is  appropriately  expanded  [81],  The  added  features  therefore 
help  increase  the  separability  between  target  and  clutter  in  pattern  space.  The  weight  vec- 
tor W is  obtained  during  a training  procedure  which  will  be  discussed  in  the  following  sec- 
tion 4.3.1 . The  discrimination  between  the  two  input  classes  (target  and  clutter)  is  done 
using  a single  threshold.  The  discrimination  function  is  a quadratic  function  of  the  image 
intensity  features  extracted  by  the  gamma  kernels,  which  leads  to  naming  the  detector  as 
the  QGD  (quadratic  gamma  detector). 

The  parameter  vector  for  the  yCFAR  (or  equivalently  for  the  two  parameter  CFAR)  is 

W = [0  0 0 -T^  1 -2  0]  ^ (83) 

where  some  of  the  parameters  are  set  to  zero  and  others  are  fixed.  Since  the  increase  of  the 
number  of  free  parameters  of  a system  is  coupled  with  more  flexibility,  we  can  improve 
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the  performance  of  the  7CFAR  detector  by  creating  more  parameters  and  adapting  them 
with  representative  data. 

4.3.1  Training  the  OGD 

In  an  adaptive  pattern  recognition  framework,  the  free  parameters  of  the  classifiers 
which  define  the  positioning  of  the  discriminant  function  for  maximum  performance  are 
learned  from  a set  of  representative  data.  This  is  called  the  training  of  the  classifier.  Given 
a set  of  training  image  chips  {Xi,X2,...,  Xyv)  centered  around  points  of  a known  class,  we 
compute  the  corresponding  feature  vector  F.  The  corresponding  desired  values  of  the  chips 
are  I’s  for  target  class  and  O’s  for  clutter  which  construct  a desired  vector  d = {dj,  d2,..., 
dN}^. 

4.3. 1.1  Closed  Form  Pseudo-Inverse  Solutions 

If  the  mean  square  error  between  the  system  output  and  the  desired  response  is  the 
cost  function,  there  is  an  analytical  solution  to  the  problem  [30].  The  method  solves  in  the 
least  square  sense  an  overdetermined  (assuming  A > 8)  system  of  linear  equations  for  the 
unknown  coefficient  vector  W by  using  the  Moore-Penrose  pseudo-inverse. 

ininWd  - FWW2  (84) 

IV 

yielding 

W=  (85) 

When  there  exist  more  weights  than  equations,  the  system  is  under-determined  and  an 
optimal  weight  can  be  calculated  by 

W = F^(FF'^)~^d  (86) 

In  either  case,  when  the  pseudo-inverse  has  zero  eigenvalues,  singular  value  decomposi- 
tion techniques  can  be  used  to  select  the  optimal  weight  vector  which  has  the  smallest 
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Euclidean  norm  [32]. 

Since  the  feature  vector  F is  a function  of  the  parameter  p.  the  problem  we  are  facing 
is  in  fact  a parametric  least  squares,  which  does  not  have  a closed  form  solution  due  to  the 
nonlinear  dependence  on  the  parameter  p.  There  are  two  possibilities  for  solving  this 
problem.  Either  we  first  determine  the  best  values  of  p„,  and  p„  through  an  exhaustive 
search  in  the  parameter  space,  or  we  have  to  use  an  iterative  approach  to  find  both  the 
weight  vector,  p^  and  p„. 

One  remark  is  necessary  at  this  point.  Training  of  an  adaptive  system  needs  to  follow 
certain  steps  for  good  results  [1].  In  particular  the  training  samples  should  be  representa- 
tive of  the  conditions  found  during  the  testing,  and  they  should  outnumber  (at  least)  10:1 
the  number  of  free  parameters  in  the  network.  For  the  QGD  the  minimum  number  of  train- 
ing exemplars  is  easily  met.  The  training  and  testing  procedures  are  depicted  in  Figure  22. 

The  methods  of  finding  a weight  vector  are  further  discussed  in  the  following  two 
subsections. 

4.3. 1.2  Iterative  Solution  Based  on  Gradient  Descent 

An  optimal  set  of  p^  and  p„  can  be  determined  through  the  parameter  space  search  to 
provide  a minimum  false  alarm  for  100%  detection  in  the  training  set.  However,  the  search 
of  the  parameter  space  p is  two-dimensional  so  that  finding  p becomes  computationally 
expensive.  Computing  both  the  weight  vector  W and  the  values  of  p^  and  p„  can  be 
accomplished  by  using  a gradient  descent  method  (LMS  algorithm),  borrowed  from  adap- 
tive signal  processing  theory.  It  is  known  that  the  LMS  method  converges  to  the  Moore- 
Penrose  solution  of  the  overdetermined  least  squares  problem  [35]. 

Using  the  method  of  gradient  descent,  the  weights  and  the  parameters  are  adjusted  in 
an  iterative  manner  along  the  error  surface  with  the  aim  of  moving  them  progressively 
toward  the  optimum  solution.  Figure  23  depicts  an  adaptation  scheme  for  the  QGD  param- 
eters and  weights. 
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a)  Training  phase  of  QGD 


b)  Testing  phase  of  QGD 


Figure  22  Training  and  testing  of  the  Quadratic  Gamma  Detector  (QGD). 
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Figure  23  Adaptation  scheme  for  parameters  and  weights  of  the  QGD. 


In  the  batch  mode  of  adaptation,  weight  updating  is  performed  after  the  presentation 
of  all  the  training  examples  (this  constitutes  an  epoch).  For  a given  set  with  N training 
examples,  a cost  function  can  be  defined  as 


E{k)  = Ud{k) -y{k)\P 

1 

n = 1 


J_ 

Np 


N 


I 


\d{k)  -y{k)\P 


(87) 


where  E{k)  is  the  p normed  absolute  value  of  an  instantaneous  error.  The  adjustment 
applied  to  the  weights  (VV)  and  the  parameter  (p)  are  derived 
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AW 


where 


and 


^ -r| 


dW 


N 


T1 


k = 1 
N 


= ^ X ^isnidik)  -y{k))\d{k)  -y{k)f 


k=  1 
N 


" 7^  S (1^)  -y(k))  \d  (k)  - y (k)  f ^^{k) 


k=  1 


(88) 


= -a 


a 


Af 


= ^ X (^/  (/:)->’  (A:) ) 1^  (A:)  - y (A:)  1^  - ‘ (A:) 


it=  1 


a ^ 

--j^W^'Zsignidik) -y{k))\d{k) 


k=  1 


(89) 


^ = f 
aiy 


(90) 


^/^n(\|/)  = { 


1 if  V > 0 

-1  if  Vj/<0 


(91) 


We  can  compute  the  derivative  of  the  feature  vector  as 
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(n+1) 


(V^l+/2)  ^ ^ 


(93) 


(<?«,  |X„  (^1’ ^2^  Sn+  ^ 


The  adaptations  of  W and  p„  can  be  summarized  as 

W(k+  1)  = W(k)  +AW(k) 
1)  = p^(^)  +A^i^(A:) 


(94) 


One  problem  of  adapting  p is  to  revisit  the  2-D  image  domain  and  to  run  a convolu- 
tion operation  for  extracting  feature  vector  F at  each  iteration.  This  is  computationally 
intensive.  Since  the  gamma  kernels  are  circularly  symmetric,  we  first  project  the  con- 
volved image  to  the  radial  direction,  and  then  use  the  already  developed  1-D  algorithm  to 
adapt  p.  This  will  be  discussed  in  details  in  the  following  section. 


69 


4.3.2  1-D  Implementation 

The  feature  vector  F is  constructed  from  the  convolution  of  gamma  kernels  and 
training  subimages.  The  adaptation  of  p requires  the  extraction  of  a new  feature  vector  F 
based  on  a different  kernel  shape;  that  is,  we  would  have  to  go  back  to  the  image  plane  and 
compute  a new  feature  vector  F after  every  epoch.  The  2-D  convolution  takes  O(N^) 
multiplications,  so  for  each  training  epoch,  the  adaptation  of  p would  be  extremely 
computationally  expensive  even  for  small  training  subimages.  Fortunately,  the  2-D  gamma 
kernels  are  circularly  symmetric  so,  without  loss  of  information,  the  result  of  the 
convolution  can  be  projected  radially  to  1-D  (see  Figure  24)  as  follows:  The  pixel 
intensities  in  concentric  annulus  around  the  pivot  point  are  converted  to  a 1-D  signal,  and 
this  signal  is  then  convolved  with  1-D  gamma  kernels. 


Figure  24  Image  converted  into  1-D  radial  energy. 

Therefore,  we  convert  the  training  subimages  to  1-D  radial  profiles,  generate  1-D 
gamma  kernels  and  utilize  the  recursive  implementation  of  the  gamma  kernel  to  perform 
the  convolution.  This  scheme  reduces  much  of  the  computational  burden  of  2-D  imple- 
mentation. The  2-D  gamma  kernels  are  reduced  to  the  forms  of  1-D  gamma  kernels. 
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2nn 


(/p/s) 


(95) 


The  adaptation  of  the  weights  W and  of  p can  be  performed  using  the  LMS  (least 
mean  square)  algorithm  in  1 -D. 


4.4  Comparison  of  the  OGD  with  the  CFAR  and  the  tCFAR  Detectors 

To  draw  a parallel  between  the  QGD  decision  function  (in  (8 1 ))  and  the  two  parame- 
ter CFAR  detector  (in  (51)),  we  first  observe  that  and  are  estimates  of  the  local 
mean  and  variance,  respectively,  around  the  test  cell  X,  and  that  the  standard  deviation 
can  be  computed  from  the  first  and  the  second  moments  by  (52)  and  (53).  (54)  can  be 
interpreted  as  a linear  decision  function  with  a fixed  set  of  weights  for  a given  Tf-yAji-  The 
measurements  that  appear  in  this  equation  correspond  closely  to  some  of  those  used  in  the 
QGD  (in  (81)).  Specifically,  ^ • X has  the  same  role  as  X^ , and  g„  ^ * X corresponds 
to  the  local  mean,  X^.  Similarly,  for  the  second  moment  we  have  correspondence  between 
g„  II  * ^^and  X^.  With  this  formulation  for  the  QGD  we  preserved  the  similarity  to  the 
two  parameter  CFAR  detector  (the  types  of  features  used)  but  have  generalized  it  with 
respect  to:  (1)  the  shape  of  the  kernels  used  for  the  estimation  of  the  mean  and  variance 
and  (2)  the  selection  of  the  weights  of  the  decision  function  which  are  not  chosen  a priori 
but  are  found  through  optimization.  Note  also  that  the  QGD  has  two  additional  linear 
terms  and  a bias  term.  The  two  parameter  CFAR  detector  is  therefore  a special  case  of  the 
QGD.  In  this  formulation  however,  we  can  no  longer  guarantee  the  constant  false  alarm 
rate  property  of  the  two  parameter  CFAR  which  is  achieved  for  Gaussian  clutter  statistics. 


4.5  Artificial  Neural  Networks  (ANNs) 


4.5.1  Introduction 


Human  being  performs  many  complex  tasks  such  as  speech  and  image  recognition 
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with  relative  ease  which  are  very  difficult  to  solve  using  traditional  algorithmic  computing 
techniques.  Much  effort  has  been  made  on  the  development  of  machines  which  have 
human-like  information  processing  capabilities.  A neural  network  model,  a simplified 
model  of  brain,  is  motivated  from  the  neuroanatomy  of  living  animals  with  cells  corre- 
sponding to  neurons,  activations  corresponding  to  neuronal  firing  rates,  connections  corre- 
sponding to  synapses,  and  connection  weights  corresponding  to  synaptic  strengths. 

The  neural  network  model  is  also  called  a connectionist  network  and  consists  of  a set 
of  computational  units  and  connection  weights  between  the  units,  with  the  processing 
tasks  being  distributed  across  many  units.  Most  of  the  current  neural  network  models  are 
far  from  realistic  biological  neural  models  but  they  serve  as  good  models  for  essential 
information  processing  that  organisms  perform.  In  this  chapter,  one  of  the  most  popular 
neural  models,  a multi-layer  perceptron  (MLP)  model  is  discussed  along  with  its  learning 
algorithms,  validation  and  generalization  problems.  The  idea  of  the  MLP  here  is  used  for 
extending  the  QGD  into  a non-linear  structure,  which  will  be  later  discussed  in  Chapter 
4.6. 

4.5.2  Multi-laver  Perceptrons  (MLPs) 

A multi-layer  perceptron  (MLP)  has  played  a central  role  in  neural  network  modeling, 
which  is  probably  the  most  widely  used  ANN.  An  MLP  is  a feedforward  network  in  which 
the  network  input  is  propagated  forward  through  several  processing  layers  before  the  net- 
work output  is  obtained.  Each  layer  is  composed  of  a number  of  nodes,  and  each  node 
forms  a weighted  sum  of  inputs  from  the  nodes  in  the  previous  layer  and  nonlinearly  trans- 
forms the  sum  through  a bounded,  continuously  increasing  nonlinearity.  A multi-layer 
feed-forward  network  is  shown  in  Figure  25. 

Learning  is  accomplished  by  minimizing  the  error  between  outputs  and  target  values 
[68],  or  by  maximizing  an  entropy  measure  on  the  outputs. 
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The  network  learning  based  on  gradient  descent  requires  knowledge  of  the  derivative 
of  the  activation  functions  associated  with  neurons  so  that  the  activation  functions  need  be 
differentiable  with  respect  to  the  network  weights.  The  sigmoidal  and  hyperbolic  tangent 
functions  are  commonly  used  as  differentiable  activation  functions  in  the  MLPs.  Thus  the 
network  output  is  a continuous  (continuously  differentiable)  function  of  every  weight  in 
the  network,  enabling  it  to  be  trained  using  gradient  descent  rules.  The  availability  of  such 
learning  algorithms  popularized  the  MLR  The  model  structure  does  not  depend  on  the 
learning  rule.  It  is  known  that  in  classification  tasks  a three-layer  MLR  network  with 
threshold  activation  functions  could  represent  an  arbitrary  decision  boundary  to  arbitrary 
accuracy  [38].  For  functional  approximation,  a three-layer  MLR  with  sufficient  nodes  in 
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the  hidden  layer  could  approximate,  to  arbitrary  degree,  any  continuous  nonlinear  function 
[37],  It  is  obvious  that  an  MLP  can  be  used  as  a tool  for  providing  a general  framework  for 
representing  non-linear  functional  mapping  between  the  input  space  and  the  output  space. 

4.5.3  Training  MLPs 

A neural  network  has  the  ability  of  the  network  to  learn  from  its  environment.  A neu- 
ral network  adjusts  its  free  parameters  towards  minimizing  or  maximizing  a criterion  func- 
tion in  an  iterative  manner  during  its  learning  period.  There  are  two  general  learning 
paradigms,  supervised  and  unsupervised  learning.  In  supervised  learning  the  output  of  a 
network  is  compared  with  a desired  response  and  the  error  between  the  outputs  and  the 
desired  responses  are  used  to  correct  the  network  parameters.  In  unsupervised  learning 
any  desired  responses  are  not  provided  to  the  network.  The  network  discovers  for  itself 
interesting  categories  or  features  in  the  input  data.  In  the  mid-eighties,  a gradient  descent 
learning  algorithm  called  Back  Propagation  (BP),  which  enables  MLPs  to  learn  arbitrary 
functional  mapping,  stimulated  considerable  interest  in  learning  systems.  BP  is  a super- 
vised learning  mechanism  in  feedforward  networks  using  a cost  function  and  gradient 
descent  which  is  the  most  widely  used  training  algorithm.  This  section  discusses  BP  algo- 
rithms (on-line  learning  and  batch  learning)  for  a network  having  a feed-forward  topology 
and  differentiable  non-linear  activation  functions  for  the  case  of  a differentiable  cost  func- 
tion. 

4.5.3. 1 On-line  Learning 

In  the  on-line  learning  process,  a network  uses  only  the  information  provided  by  a 
single  training  example  {x(t),  y(t)}  when  the  network  parameters  are  adapted.  Consider  an 
epoch  of  A training  examples  in  the  following  order:  {x(l),  J(l)},...,  {jc(l),  ^/(A0).  Fora 
training  pattern,  each  node  computes  a weighted  sum  of  inputs  of  the  form 

Vjin)  = Y^Wj.{n)y.{n) 


(96) 
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where  y,(n)  is  the  activation  of  a node  or  input,  which  is  connected  to  node  j and  Wjj{n)  is 
the  weight  associated  with  that  connection.  The  sum  in  Eq  68  is  transformed  by  a non-lin- 
ear activation  function /(.)  to  give  activation  yjin)  of  node  j in  the  form 


Since  the  activation  outputs  are  successively  computed  layer  by  layer  this  process  is 
often  called  forward  propagation.  Now,  the  instantaneous  error  signal  at  the  output  of  node 
k at  iteration  n is  defined  by 


On-line  back  propagation  learning  generally  attempts  to  minimize  the  sum  of  squared 
instantaneous  errors  (SSE)  at  the  nodes  in  the  output  layer.  The  SSE  is  defined  by 


where  K is  the  number  of  nodes  in  the  output  layer.  For  a given  training  set,  E{n)  repre- 
sents the  cost  function  as  a measure  of  training  set  learning  performance.  The  object  of  the 
learning  process  is  to  adjust  the  free  parameters  of  the  network  so  as  to  minimize  the  cost 
function.  The  adjustments  to  the  weights  are  made  in  accordance  with  the  respective  errors 
computed  for  each  training  example  to  the  network. 

Gradient  descent  learning  algorithms  adapt  the  weights  in  the  direction  of  the  negative 
gradient  of  the  cost  function.  The  correction  Avv^-  (n)  applied  to  wy,(n)  is  defined  by  the 
delta  rule 


yjin)  =f{vj{n)) 


(97) 


e(n)  = dj.{n)  -y,,{n) 


(98) 


E(n)  = ^ ^ e(k)2 


(99) 


k = 1 


(100) 


where  q is  the  learning  rate  of  the  BP  algorithm.  The  learning  rate  controls  the  speed  of 
network  training  and  affects  the  network  stability.  For  a node  j in  the  output  layer,  the  par- 
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tial  derivative,  dE  (n)  /dwj-  (n),  can  be  factorized  in  the  form  by  the  chain  rule  as  follows 


dE(n)  dE(n)  dvj(n) 
~ dvj(n)  dwj-(n) 
dv-(n) 

= -5.(n) 


dwj.{n) 


J'  ' dwj-(n) 


(101) 


where  5^.  (n)  = -dE  (n)  /dvj  (n)  is  a local  gradient  at  a node  in  the  output  layer,  which 
is  a sensitive  factor  of  the  cost  function  with  respect  to  the  output  of  a node  i before  the 
activation  function.  Differentiating  Vj{n)  with  respect  to  wy,(n)  yields 


9v.(n) 


= y,(«) 


dw.-{n) 

The  local  gradient  b.  (n)  at  a node  j in  the  output  layer  is  obtained  by 

^j(n)  = ej(n)ip'  (Vj{n)) 

Accordingly,  the  correction  Aw^-  (n)  with  (102)  and  (103)  can  be  written  by 
Awj-{n)  - r\6.{n)y-{n)  at  a node;  in  the  output  layer 


(102) 


(103) 


(104) 


For  a node  in  a hidden  layer,  the  local  gradient  can  be  computed  recursively  in  terms 
of  the  local  gradients  of  all  nodes  in  the  next  layer  in  the  forward  direction. 

o . , dE{n) 
oAn)  = -y- 
J dVj{n) 

Y dv,.{n)  dvj(n) 

= ys^(«) 

V ov-(n) 


at  a node  j in  a hidden  layer 


(105) 


where  the  sum  runs  over  all  units  k to  which  the  node  j sends  connections.  The  net  activa- 
tion level  at  node  k is 


v^(n)  = '^Wi^j{n)yj{n) 


(106) 


76 


The  partial  derivative  in  (105)  can  be  rewritten 

_ ^yk(n)  dvi.{n) 
dvj{n)  dvi.{n)  dvj{n) 

= a'  {vi^in))w^j(n) 


(107) 


Thus,  using  (105)  and  (106),  we  obtain  the  following  back-propagation  formula 


dj(n)  = o' at  a node  j in  a hidden  layer  (108) 

k 

The  local  gradient  at  a node  in  the  jth  hidden  layer  can  be  computed  by  propagating  the 
local  gradients  backwards  from  the  nodes  in  the  j+ 1 layer. 

4.5. 3.2  Learning  Rate  and  Momentum 

The  network  training  based  on  the  gradient  descent  method  requires  to  choose  a suit- 
able value  for  the  learning  rate  r\ . The  effectiveness  and  convergence  of  the  BP  learning 
algorithm  depend  significantly  on  the  value  of  the  learning  rate  q . When  the  curvature  of 
the  error  surface  varies  significantly  with  the  direction  of  interest  in  the  weight  space  a 
large  value  of  q will  result  in  oscillation  in  the  error  surface  and,  for  a fairly  flat  error  sur- 
face, a small  value  of  q will  result  in  a very  slow  convergence  in  the  error  minimization 
process  because  the  weight  adaptation  applied  to  a weight  is  proportional  to  the  derivative 
of  E with  respect  to  the  weight.  Only  small  learning  rates  guarantee  a true  gradient 
descent,  increasing  the  total  number  of  learning  steps  that  needed  to  reach  a satisfactory 
solution.  One  of  the  simplest  methods  in  accelerating  convergence  speech  is  addition  of  a 
momentum  term  in  the  weight  adaptation  (in  (104)). 

Awjj  (n)  = q5^.  (n)y.  (n)  -i-  aAwj- (n  - 1)  (109) 

where  a is  a momentum  rate  and  Awjj  (n  - 1 ) is  a weight  correction  when  the  n-l  th 
training  example  is  present  at  the  network  input. 
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Summary  of  Back  Propagation  Algorithm 


1.  Determine  a network  topology,  initialize  randomly  all  weights  and  set  a learning 
rate  and  momentum  rate. 

2.  Repeat  until  a network  performance  is  satisfied. 

2. 1 Forward  Propagation  step:  Feed-forward  the  training  examples  into  the  network 
and  compute  the  activations  of  all  nodes 

2.2  Calculate  errors  in  the  output  layer  and  compute  the  local  gradients  at  all  nodes 

dj{n)  = ej{n)o' {Vj{n))  at  node  j in  the  output  layer  (110) 

5j(n)  = o'  at  node  in  a hidden  layer  (111) 

2.3  Update  the  weights 


4. 5. 3. 3  Batch  Learning 

In  on-line  BP  learning,  the  weight  adaptation  was  performed  after  the  presentation  of 
each  training  example.  The  local  gradient  estimates  use  only  a single  piece  of  training 
information  when  the  weight  adaptation  is  performed. 

In  batch  BP  learning,  the  weights  are  adjusted  after  presenting  the  entire  set  of  train- 
ing examples.  The  batch  learning  provides  a more  accurate  estimate  of  the  gradient  vector. 
The  instantaneous  squared  errors  are  summed  up  over  the  entire  training  examples  and  the 
average  of  the  total  squared  errors  can  be  defined  as  a cost  function 


Wj- (n  -I- 1 ) = Wj.  (n)  -I-  ri5^.  (n)  y ■ (n)  + aAw^..  (n  - 1 ) 


(112) 


(113) 


n 


= U'  = 


where  N is  the  number  of  training  examples.  A correction  AWj.  applied  to  a weight 
proportional  to  the  negative  gradient  of  the  cost  function 
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I* 


N 

'Lej(n) 


n = 1 


dejin) 


(114) 


4.5.4  Validation  of  A Neural  Model 

In  any  learning  system,  the  validation  of  system  learning  is  one  of  the  most  important 
parts  because  the  objective  of  network  training  is  not  to  learn  an  exact  representation  of 
the  training  data  itself,  but  rather  to  learn  the  underlying  statistics  of  the  process  which 
govern  the  data.  Therefore,  during  the  learning  phase,  the  network  should  assess  how  well 
it  has  learned  the  training  data  and  is  able  to  generalize  unforeseen  data.  In  order  to 
develop  quantitative  techniques  to  evaluate  a network’s  performance  with  real-world  data, 
rigorous  mathematical  foundations  must  be  developed  to  determine  the  characteristics  of 
the  training  set  and  the  network’s  ability  to  generalize  from  the  training  set.  However, 
although  the  question  of  whether  a network  possesses  the  ability  to  generalize  correctly 
(or  sufficiently  accurately)  is  still  unsolved,  most  learning  algorithms  can  successfully 
learn  a set  of  training  examples  given  a sufficiently  flexible  model  structure  or  an  appropri- 
ate learning  algorithm.  A popular  methods  in  assessing  the  validity  of  trained  networks  is 
to  split  the  available  data  into  two  sets,  a training  set  and  a test  set.  The  training  set  is  fur- 
ther split  into  two  subsets;  one  set  used  for  training  the  network  and  the  other  for  evaluat- 
ing the  performance  of  the  network  during  the  training  phase  [80]. 

4.5.4. 1 BiasA^ariance  Dilemma 

A network  model  that  is  too  inflexible  will  have  a large  bias,  while  one  that  is  too  flex- 
ible will  have  a large  variance.  This  can  be  explained  by  the  bias/variance  dilemma  [28]. 
The  MSE  performance  measure  for  a given  training  set  can  be  decomposed  into  the  sum  of 
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two  components  which  reflect  the  squared  bias  of  the  network  error,  and  the  variance  of 
the  estimates  of  the  trained  network. 

Let  D denote  a training  set,  y{x,D)  ne  the  output  of  a network  trained  using  the  data 
eontained  in  D and  £/)(.)  denote  the  expectation  operator  taken  over  all  possible  training 
sets,  then  the  output  error  is  given  by 

=y{x)-y{x,D)  (115) 

The  effectiveness  of  the  network  as  a predictor  of  y (x)  by  calculating  the  MSE  for 
all  possible  training  sets  D is  given  by 

Ed  (^y  W ) =Ed(9  - y D) ) 

= (5*  (jf)  - Ed  (y  (x,D)))^  + Ed(  (y  (jt)  - y {x,  D) ) 

The  first  term  in  the  right  hand  side  is  called  the  square  of  the  bias,  and  the  second 
term  estimates  the  variance  of  the  network  approximations.  The  bias  measures  the  average 
modelling  error  and  the  variance  measures  how  sensitive  the  network  modelling  is  to  a 
partieular  choice  of  data  set.  When  the  network  is  too  flexible,  a large  variance  occurs  so 
that  the  performance  of  the  network  is  very  sensitive  to  a particular  training  set.  This 
results  in  a poor  MSE  performance.  The  network  also  produces  a poor  MSE  performance 
when  it  possesses  too  little  flexibility  which  causes  a large  bias.  Bias  and  variance  are 
complementary  quantities.  In  general,  a network  should  be  flexible  enough  to  ensure  that 
the  modelling  error  (bias)  is  small  but  should  not  be  over-parameterized  because  its  per- 
formance is  highly  sensitive  to  a particular  training  set  (high  variance). 

4.5A.2  Network  Complexity  and  Early  Stopping 

The  problem  of  choosing  a network  complexity  (in  an  MLP)  is  to  choose  the  size  of 
the  free  parameter  set  used  to  model  the  data  set  (in  terms  of  number  of  nodes  and  hidden 
layers).  A good  estimate  of  the  true  performance  of  a network  is  required  to  determine 
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whether  the  complexity  of  a network  model  is  effective  for  a particular  data  set.  If  the 
model  complexity  is  increased  and  this  resulted  in  a lower  modelling  error,  it  would  indi- 
cate that  the  network  is  not  flexible.  Similarly,  if  a more  flexible  model  produced  a higher 
modelling  error,  this  would  indicate  that  the  network  overfitted  the  training  data  set  and  a 
simpler  network  would  be  preferred  for  a better  generalization. 

An  alternative  to  obtaining  the  effective  complexity  of  a network  is  the  procedure  of 
early  stopping.  Typically,  iterative  training  of  learning  systems  reduces  a cost  function 
error  as  more  iterations  are  made  during  the  training  phase.  However,  the  generalization 
error  measured  with  respect  to  a validation  set  often  decreases  as  the  number  of  iterations 
increases  and  starts  increasing  after  a certain  iteration  point  where  the  training  causes  the 
network  to  over-fit  the  training  set.  For  a good  generalization,  training  can  be  stopped  at 
the  point  the  generalization  error  starts  increasing.  This  is  referred  to  as  the  cross-valida- 
tion and  this  procedure  provides  an  appealing  guide  to  better  generalization  performance 
of  the  network. 


4.6  Nonlinear  Extension  To  QGD  (NL-QGD) 

4.6.1  Introduction 

One  of  the  possible  extensions  to  the  QGD  is  to  augment  the  output  adder  with  a set 
of  nonlinear  processing  elements  which  nonlinearly  combine  the  feature  elements,  i.e.  to 
implement  a neural  classifier  such  as  an  MLR  The  structure  is  called  the  NL-QGD  (Non- 
Linear  extension  to  QGD).  Since  the  MLR  is  capable  of  creating  arbitrary  discriminant 
functions  this  extension  has  the  potential  to  improve  performance.  Note  that  the  QGD  cre- 
ates a quadratic  discriminant  function  of  the  image  intensity,  which  is  only  optimal  for 
unimodal  probability  density  functions  [8]  [27].  Moreover,  a nonlinear  system  normally 
generalizes  better,  so  the  performance  in  the  test  set  can  also  improve. 

In  order  to  fully  use  the  neural  network  approach  we  have  to  develop  an  iterative  algo- 
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rithm  to  adapt  both  the  weights  and  p (using  the  backpropagation  procedure).  Availability 
of  on-line  methods  of  adapting  p effectively  means  that  the  detector  output  error  can  be 
used  for  refining  p,  or  equivalently,  to  search  on-line  for  the  optimal  guard  band.  The  rela- 
tionship between  the  minimum  number  of  false  alarms  and  the  minimum  mean  square 
error  becomes  crucial  now,  since  the  system  will  be  adapting  p for  the  smallest  MSE.  The 
cost  function  used  should  correspond  to  the  minimum  false  alarm  rate,  since  this  is  the 
basis  of  scoring.  Alternate  error  norms  may  have  to  be  used  to  adapt  the  weights  to  guar- 
antee that  the  minimum  cost  corresponds  to  the  minimum  false  alarm  rate. 

Figure  26  displays  the  block  diagram  of  the  NL-QGD.  As  indicated  in  this  figure  the 
feature  expansion  is  still  the  same  as  that  of  the  QGD.  In  Figure  2 1,  we  can  think  of  the 
QGD  as  the  linear  part  of  one  of  the  hidden  layer  processing  elements.  The  two  basic 
issues  that  need  to  be  addressed  when  using  a neural  network  are  the  training  and  the  net 
topology. 

4.6.2  Training  the  NL-QGD 

In  order  to  fully  develop  the  neural  network  approach,  an  iterative  learning  scheme  is 
required  to  adapt  the  weights  and  the  parameter  p.  Availability  of  on-line  methods  of 
adapting  p effectively  means  that  the  detector  output  error  can  be  used  to  search  on-line 
for  the  optimal  local  area  and  guard  band. 

The  NL-QGD  is  trained  with  a desired  signal  ( 1 s for  the  target  class  and  Os  for  the 
non-target  class)  using  a back-propagation  algorithm  [68]. 


d{n)  = { 


1 feature  vector  belongs  to  the  target  class 

0 feature  vector  belongs  to  the  non  - target  class 


(117) 


The  sum  of  squared  errors  is  utilized  in  this  work,  i.e. 


N 


nun 

W 


^(«)  = ^ X -z(n) 


n = 1 


(118) 
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Figure  26  Adaptation  scheme  of  the  NL-QGD 
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Moreover,  in  order  to  adapt  the  parameter  p which  controls  the  scale  of  the  7CFAR 
stencil,  the  error  generated  at  the  detector  output  is  back-propagated  up  to  the  input  layer. 
The  decision  boundary  of  the  NL-QGD  is  therefore  formed  in  conjunction  with  the  param- 
eter p.  The  correction  Ap  (n)  at  each  iteration  is  proportional  to  the  instantaneous  gradi- 
ent dE{n)  /d\i(n)  . According  to  the  chain  rule,  this  gradient  is  expressed  as  follow. 


dE  (n) 

apCn) 

where  the  index  i runs  over  the  nodes  in  the  first  hidden  layer.  v/(n)  is  the  output  before  the 
nonlinear  transformation  is  applied  at  the  ith  node  in  the  first  hidden  layer  and  is  expressed 
by 

p 

v,(n)  = Yj^ip^n)fp{n)  (120) 

p = 1 

where  the  feature  element of  F is  given  by  (80)  at  each  iteration  n and 
F{n)  = [/i  (n),/2(n),  ...,/p(n)]  ^ withP  = 8. 

The  local  gradients  dE  (n)  /dv- (n)  can  be  computed  by  (105).  Therefore,  (1 19)  is 
written  with  the  partial  derivative  of  v,(n)  with  respect  to  p(n)  as 


^ dE(n)  dv-(n) 
^dv-(n)  8p  (n) 


(119) 


dE(n) 

dil(n) 


' p = 1 


dn(n) 


dp  (n) 


p = I i 

= 

p = 1 


= Aj(n) 


dp  (rt) 

dF{n) 

d\i{n) 


(121) 


where  b.  (n)  and  5^  (n)  are  the  local  gradients  at  the  ith  node  in  the  first  hidden  layer  and 
pth  node  in  the  input  layer  respectively.  The  local  gradient  vector  in  the  input  layer  is 
defined  as  Ap  (n)  = [5j(n)  82  (n)  ...  5p  (n)  ] ^.  vv,p(/i)  is  a weight  between  the  pth 
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node  in  the  input  layer  and  the  /th  node  in  the  first  hidden  layer.  The  gradient 
dF  (n)  /9|i  (n)  is  given  by  (92). 

The  adaptation  of  the  parameter  |i  is  given  by 


u,.(n+l)  — — 


dE{n)  j pi 

= p(«) +PA%(/i)3-4— F,  ^ in) 


where  P is  the  step  size,  and  j = n,  m (one  equation  for  each  kernel). 


(122) 


CHAPTER  5 


TRAINING  STRATEGIES  FOR  NL-QGD 
5.1  Introduction 

Neural  network  models  are  usually  defined  by  three  major  parts:  architecture,  cost 
function  and  training  algorithm.  Here  the  cost  function  typically  measures  enors  between 
network  estimates  and  the  actual  outputs  from  the  training  data  and  plays  an  important 
role  as  a guidance  for  network  optimization. 

The  sum  of  squares  error  (SSE)  function  (L2  norm)  has  been  widely  used  as  an  opti- 
mality index  for  network  training.  There  are  many  other  possible  choices  of  cost  functions 
which  can  be  also  considered,  depending  on  particular  applications.  Minimizing  the  SSE 
is  equivalent  to  the  maximum  likelihood  principle  for  Gaussian  distribution  errors  in  a 
regression  problems  where  the  goal  is  to  model  the  conditional  distribution  of  the  output 
variables  given  the  input  samples. 

When  used  as  a classification  device,  a feed-forward  neural  network  is  normally 
trained  with  binary  desired  responses  (0  or  1)  given  training  data  during  the  training 
period.  In  order  to  guarantee  that  the  outputs  represent  probabilities,  the  sum  of  the  output 
values  should  be  equal  to  one  and  each  output  value  should  lie  in  the  range  (0,  1 ).  For  clas- 
sification problems,  the  error  distribution  is  far  away  from  the  Gaussian  distribution 
because  the  target  variables  are  binary.  For  two-class  classification  problems,  the  error  has 
the  form  of  a binomial  distribution  and  multinomial  distribution  for  multi-class  classifica- 
tion problems  in  which  the  network  has  the  same  number  of  output  nodes  with  the  number 
of  class  membership. 
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Cross-entropy  can  also  be  used  as  a cost  function  for  network  optimization  [2]  [36] 
[79].  The  cross-entropy  cost  function  maximizes  the  likelihood  of  observing  a training 
data  set  during  the  training.  In  this  sense,  the  cross-entropy  is  theoretically  the  most  appro- 
priate cost  function  for  network  optimization  in  the  case  of  binary  classification  [69]. 

However,  when  a neural  network  is  used  as  a detector  with  a single  output  node  in  the 
output  layer  for  two  classes  and  supposed  to  detect  samples  from  one  particular  class  and 
reject  samples  from  other  classes  as  much  as  possible,  the  network  needs  to  be  trained  on 
an  optimality  condition  which  is  different  from  the  classification  problems.  Presently  this 
optimal  condition  has  not  been  formulated.  So  the  problem  has  to  be  solved  experimen- 
tally. Through  the  search  of  cost  functions  for  network  optimization,  the  performance  of 
the  two  detectors  (QGD  and  NL-QGD)  are  discussed  based  on  cost  functions  used  for 
training. 


5.2  Optimality  Index 

5.2.1  Lo  Norm 

One  of  the  most  popular  norms  in  training  networks  is  the  L2  criterion.  For  a network 
with  linear  outputs,  training  the  network  leads  to  an  optimal  solution,  the  least  squares 
solution.  The  L2  norm  is  an  appropriate  choice  for  normally  distributed  inputs  in  the  sense 
of  both  minimum  cost  and  minimum  probability  of  prediction  error  (maximum  likeli- 
hood). Consider  a set  of  training  pairs  [x(n),  d{n)},  n=l,  2,...,  N.  Assume  that  the  input 
vectors  x,  are  drawn  randomly  and  independently  from  a normal  distribution.  The  objec- 
tive of  network  training  is  not  to  memorize  training  data,  but  rather  to  model  the  underly- 
ing generator  of  the  data.  Therefore  the  best  possible  prediction  for  desired  responses  d 
can  be  made  when  new  data  is  presented  to  the  trained  network.  A network  training 
scheme  is  shown  in  Figure  27.  The  most  general  and  complete  description  of  the  generator 
of  the  data  is  in  terms  of  the  probability  density  p(x,d)  in  the  joint  input-target  space  [5]. 
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Figure  27  A network  training  scheme. 


The  joint  probability  of  the  input  and  desired  responses  can  be  expressed  in  terms  of 
the  conditional  probability  density  of  the  target  data  and  unconditional  probability  density 
of  the  input  data. 


p{x,d)  = p(x\d)p{x)  (123) 

where  p (j:|  d)  is  the  probability  density  of  the  target  variable  given  an  input  x,  and  p(x)  is 
the  probability  density  of  the  input  given  by 

P(x)  = (124) 

By  the  principle  of  maximum  likelihood,  we  want  to  maximize  the  joint  probability 
density  in  the  input-target  space  given  a set  of  input  and  target  data.  It  is  often  convenient 
to  minimize  the  negative  logarithm  of  the  likelihood,  p(x,d)  since  the  negative  logarithm  is 
a monotonic  function.  We  therefore  minimize 
E = -In  p{x,  d) 

= ~Y^lnp{d{n)\x{n))  - ^In  pixin)) 

n n 


(125) 
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Since  the  second  term  in  (125)  is  independent  of  the  network  parameter,  minimizing 
E is  equivalent  to  minimizing  the  first  term  so  the  error  function  can  be  written  as 

E = -'^lnp{d{n)\x{n))  (126) 

n 

We  assume  that  the  input  data  has  a Gaussian  distribution,  the  target  value  has  a form 
of  deterministic  function  with  added  Gaussian  noise  ej^,  so  that 

d^  = h,^{x-,w) (127) 


We  now  want  to  model  the  deterministic  function  h^f^x)  by  a neural  network  with  out- 
put yi((x,w)  where  w is  the  set  of  weight  parameters  determining  the  network  mapping. 
The  distribution  of  e(n)  is  given  by 


P{e^) 


7^ 


exp 


no 


V 2a^J 


(128) 


Using  (127)  and  (128),  the  probability  density  of  the  target  variables  is  given  by 


exp 


KO 


f (yk(^'^w)  -df.) 

■ 2^2 


(129) 


The  mapping  function  between  input  and  target,  hi^{x),  was  replaced  by  the  network 
model  yic(x;w).  Substituting  (129)  into  (126),  we  obtain  the  error  function  [5], 

j N c 

^ X {yk(^(fT-)r'^)  -d,,{n)}^  + NClnc+^ln(2n)  (130) 

ik=i  ^ 

where  C is  the  number  of  classes  and  N the  number  of  exemplars.  In  ( 1 30),  the  second  and 
third  term  are  independent  of  the  network  parameters  w and  can  be  omitted.  The  premulti- 
plication factor  in  the  first  term  can  also  be  omitted.  The  error  function  then  has  a form  of 
the  SSE  function 

1 ""  ^ 
n = U-  = 1 


(131) 
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Here,  the  SSE  cost  function  was  derived  from  the  principle  of  maximum  likelihood  on  the 
assumption  of  Gaussian  distributed  target  data. 

5.2.2  Training  with  Excluding  Outliers  from  Non-Target  Class  with  Lo  Norm 

In  the  training  data  for  the  QGD  and  NL-QGD,  it  is  observed  that  some  of  the  target 
and  non-target  class  data  overlap  because  when  a pixel  intensity  significantly  exceeds  the 
mean  intensity  in  its  surrounding  region  the  two-parameter  GEAR  or  yCFAR  detector 
declares  the  pixel  as  detected  because  targets  and  man-made  objects  (non-targets)  are  usu- 
ally brighter  than  background  in  their  surrounding  areas. 

The  objective  of  training  the  NL-QGD  (or  any  detector  for  that  matter)  is  not  to  obtain 
minimum  classification  error  but  to  achieve  a minimum  false  alarm  rate  while  maintaining 
a high  probability  of  target  detection.  The  training  of  neural  networks  are  mostly  intended 
to  produce  minimum  classification  error  in  given  training  data  and  to  well  generalize  the 
data  which  has  never  been  seen  by  the  network.  This  leads  to  partitioning  of  the  input 
space  into  subregions  according  to  the  number  of  classes  and  usually  places  a decision 
boundary  among  highly  populated  regions  of  classes. 

In  two-class  detection  problems,  the  input  space  is  partitioned  into  two  subspaces,  one 
for  the  target  class  and  the  other  for  the  non-target  class.  The  NL-QGD  produces  very  low 
values  of  outputs  for  target  data  which  are  far  away  from  the  decision  boundary  and  reside 
in  a deep  region  of  the  non-target  subregion.  These  low  output  values  really  degrade  high 
probability  of  target  detection  because  a threshold  for  the  NL-QGD  is  set  based  on  mini- 
mum target  output  value  for  a 100%  target  detection.  It  is  therefore  desired  that  a decision 
boundary  be  placed  in  order  to  encompass  all  target  samples  into  the  target  subregion  and 
exclude  non-target  samples  by  as  many  as  possible  from  the  target  subregion.  This,  of 
course,  may  not  yield  minimum  classification  errors  but  leads  to  obtaining  high  probabil- 
ity detection  in  favor  of  one  class. 

Figure  28  illustrates  the  effect  of  outliers  in  placing  a decision  boundary  in  the  input 
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Space.  In  Figure  28a,  the  data  set  contains  15  samples  (marked  by  “*”)  for  target  class  and 
15  samples  (marked  by  “o”)  in  the  input  space.  The  desired  responses  are  I’s  and  -I’s 
respectively  for  the  target  class  and  non-target  class. 

In  Figure  28a,  the  decision  boundary  #1  was  configured  based  on  the  LS  solution  by 
using  all  samples  while  the  decision  boundary  #2  was  formed  by  the  LS  solution  which 
was  computed  after  removing  four  non-target  outliers  ((-3,1),  (-2,2),  (-1,2)  and  (-1  1)) 
from  the  data  set.  By  removing  the  four  non-target  outliers  when  computing  a LS  solution, 
the  decision  boundary  moves  in  the  direction  where  the  target  outliers  are  located  so  that 
more  target-class  samples  are  included  above  the  decision  boundary  #2.  Figure  28b  plots 
the  outputs  of  the  data  set  based  on  the  two  LS  solutions.  From  the  decision  boundary  #1 , 
10  false  alarms  occurs  in  order  to  detect  all  target  outputs.  However  the  decision  boundary 
#2  yields  8 false  alarms  for  all  target  detection. 

There  is  a lack  of  theoretical  foundation  to  design  cost  functions  for  network  training 
in  such  a case.  A simple  way  of  implementing  the  idea  is  to  train  the  NL-QGD,  excluding 
outliers  from  non-target  samples  during  the  training  period.  Since  outliers  from  the  train- 
ing data  cause  low  output  values  for  target  class  and  high  output  values  for  non-target 
class,  this  yields  large  error  values  which  are  used  to  correct  network  parameters  during 
training.  Error  values  serve  as  forces  that  position  the  decision  boundaries  in  the  direction 
of  the  corresponding  samples.  Therefore  removing  non-target  outliers  helps  the  decision 
boundary  to  shift  towards  the  direction  of  target  outliers.  This  can  yield  larger  output  val- 
ues of  the  NL-QGD  from  target  outliers  and  reduces  false  alarms  correspondingly  on  the 
operating  point  of  high  probability  target  detection. 
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Figure  28  An  example  of  outlier  effect  to  decision  boundary  formation  and  false 
alarm  rate. 
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Training  procedure  of  NL-QGD  with  removal  of  non-target  outliers 

1.  Train  NL-QGD  with  a complete  set  of  training  data  and  cross  validation  data  for  a 
certain  number  (Nj)  of  iterations 

2.  Compute  false  alarm  for  100%  target  detection  from  the  cross  validation  set. 

2.1  if  (minFA  > FA(iter))  { 

• minFA  = FA(iter) 

• stopiter  = stopiter  -i-  1 

• remove  the  non-target  exemplar  which  caused  the  largest  output  values  from 
the  non-  target  exemplars  of  the  training  data. 

} 

else  i/ (minFA  = FA(iter))  { 

//■(stopiter  < N2)  stopiter  = stopiter  -1-  1 
else  stop  training 
} 

2.2  train  NL-QGD  for  a certain  number  of  iterations  (N3). 

2.3  go  to  step  2 


5.2.3  L^  Norm 

The  L2  norm  (SSE  function)  was  derived  from  the  Gaussian  distributed  target  vari- 
ables. The  L2  norm  equally  weighs  all  errors.  When  there  are  long  tails  on  the  distributions 
then  the  solution  can  be  significantly  affected  by  a very  small  number  of  points  called  out- 
liers which  cause  particularly  large  errors.  We  can  obtain  more  general  error  functions  by 
extending  the  Gaussian  distribution  function  to  a more  general  form  [5]  [7] 


P(e) 


2cr(Wp)‘^’’[ 


>\P  \ 


Pi<^JP 


(132) 


where  F(.)  denotes  the  Gamma  function  and  is  the  dispersion  parameter  in  the  Lp 

sense.  For  the  case  of  /?  = 2 this  reduces  to  a Gaussian.  Substituting  (132)  into  (125)  and 

omitting  constant  terms,  we  obtain  a generalized  error  function  of  the  SSE 

N C 


n=\k=\ 


(133) 
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This  shows  the  link  between  error  distribution  and  Lp  norm  criterion  function.  Figure  29 
shows  error  weighting  effect  of  Lp  norms  according  to  a norm  power,  p.  When  p increases, 
large  errors  become  more  weighed  than  small  errors.  On  the  other  hand,  small  error  get 
more  weighed  than  large  errors  as  p decreases  below  p = 1 . This  implicates  that  the  deci- 
sion boundary  forms  differently  in  the  input  space,  depending  upon  norms. 


Figure  29  \d-  y|^  versus  e for  different  p. 

The  derivatives  of  the  Lp  norm  error  function  with  respect  to  the  network  parameters 
are  [5]  given  by 

'^ji  n k 

sign{df.{n)  -y^{x{n)  ;w) ) ^y^, (jf  (n)  ;w)  ^134) 


If  we  view  the  Lp  norm  error  function  in  an  asymptotic  sense  the  error  function  is 
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rt  = lit  = 1 

C 


= j\di,-y,^(x;w)\Pp(x\(x)i^)dx 


k=  1 

c 


= ^ P(cop 


(135) 


k = 1 


where 


fix)  = \d,^-y^ix;w)\P  (x|  (o^ 


(136) 


For  p = 2,  the  cost  function  (L2)  places  emphasis  on  the  errors  from  samples  in 
densely  populated  regions,  that  is,  in  the  regions  where  p(x)  is  large,  rather  than  in  the 
regions  near  the  decision  boundary.  When  p is  large  the  optimization  process  of  the  net- 
work is  dominated  by  the  samples  which  cause  large  errors.  If  p is  taken  to  be  small,  more 
emphases  is  placed  in  the  region  of  the  decision  boundary. 

A BP  algorithm  for  MLPs  can  be  used  with  only  minor  modification  at  the  local  gra- 
dient in  the  output  layer.  The  local  gradients  in  the  output  layer  is  modified  as 


where  / and  L are  a node  index  and  the  number  of  nodes  in  the  output  layer  respectively. 
Ein)  is  a instantaneous  error  defined  as 


Large  norms  usually  slows  down  the  training  speed.  This  effect  can  be  seen  by  rewrit- 


S/(«) 


L 


= 9'  (y/  («) ) X {di  in)  - Jiin) ) \di  in)  - in) \p  ‘ 


(137) 


/=  1 


(138) 


ing  (137)  in  the  following  equation 
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L 

5,(«)  = |e(n)P"^(p'(y,(n))  -y^(rt))|e(rt)|  (139) 

/ = 1 

The  weight  adaptation  is 

Aw  I (n)  = T|5^  (n)  (n)  + aAwy  (n  - 1) 

L 

= ^p(p'  iyi(n))yiin)  Y^sign{di{n)  - (n) ) |e  (n)  | (140) 

1=  1 

+ aAw;  (n  - 1) 


where  r\^  is  the  effective  learning  rate  for  a norm  ip)  and  ri^  = r\\e  (n)\P~^.  When 
\e{n)\  < 1 , Tj^  become  very  small  for  large  norms  (p  > 2)  so  that  the  convergence  speed 
becomes  very  slow.  Note  that  r|  is  the  learning  rate  for  p = 2. 


5.2.4  Mixed  Lj,  Norm 


The  errors  from  the  training  data  are  equally  penalized  with  the  L2  norm.  For  p <2, 
more  emphasis  is  placed  on  smaller  errors  than  on  large  errors  while  the  Lp  norms  (p>2) 
penalize  larger  errors  more  than  smaller  errors.  So,  by  using  mixed  norms,  the  different 
emphasis  on  the  target  cass  samples  and  the  non-target  class  samples  can  be  made  in  train- 
ing the  NL-QGD.  It  is  desirable  to  use  a larger  norm  (p>2)  for  the  target  class  in  order  to 
prevent  the  NL-QGD  from  producing  very  low  output  values  from  target  outliers  and  a 
smaller  norm  (p  <2)  for  the  non-target  class.  Since  the  smaller  norm  weighs  the  smaller 
errors  more,  large  errors  from  non-target  outliers  could  not  gain  force  as  much  as  the  L2 
case  to  pull  the  decision  boundary  into  their  directions.  A cost  function  can  thus  be  defined 
with  a mixed  form  of  two  different  norms. 


E = 


1 


N. 


X \d-y(X,w) 


pi 


ntx  e CO 


1/p/ 


-I- 


^ X \d-y{X,w)\p^ 

_^nXea, 


[/ph 


(141) 


where  is  the  number  of  non-target  samples  and  is  the  number  of  target  samples,  pi  is 
a small  norm  (pi  < 2)  and  ph  is  a large  norm.  In  order  to  prevent  the  sum  of  p/-power  errors 
from  dominating  the  sum  of  p/z-power  errors  (ph  > 2)  when  errors  are  less  than  1 (the  case 
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for  NL-QGD  with  a sigmoidal  activation  function  at  the  output)  those  sums  are  normal- 
ized by  the  numbers  of  their  class  samples  and  the  power  of  the  inverses  of  pi  and  ph 
respectively.  The  weight  adaptation  is  performed  after  presenting  all  the  samples  each 
time  (batch  training).  The  local  gradient  at  the  output  can  be  written  as 


N. 


\d-y{X,w)\P^ 


ntX  e CO 


pi  1 


X \d-y(X,w)\P‘  V(V;t) 


ntX  e CO 


nXe  2 


ph  ^ 1 


X |c/-y(X,w)|/’^-*(p'(v,) 


nXe  2 


(142) 


5.2.5  Cross  Entropy 

The  binomial  distribution,  one  of  the  most  useful  discrete  distributions,  is  based  on 
the  idea  of  a Bernoulli  trial  [69].  A Bernoulli  trial  is  an  experiment  with  only  two  possible 
outcomes.  Observation  of  this  nature  arise,  for  instance,  in  medical  trials  where,  at  the  end 
of  the  trial  period,  a patient  has  either  recovered  {d=\)or  has  not  {d  = 0).  When  a network 
uses  a single  output  which  can  meet  probability  conditions  for  two-class  classification 
problems,  we  want  the  value  of  the  output,  y,  to  represent  the  posteriori  probability 
p (CO  j I or)  for  class  cOj , that  is,  y = p{(a^^x)  and  for  class  cOj,  p (CO2I  x)  =l-y.  Inour 
problem,  y = P (target  class/input  image  chip)  for  the  target  class  {d=  1)  and  1 - y = 
P(non-target/input  image  chips)  for  the  non-target  class  (d  = 0).  So  we  combine  this 
scheme  into  a single  scheme.  The  probability  of  observing  either  target  or  non-target  is 

p{t\x)  = y‘^{l-y)^-‘^  (143) 

This  is  the  Bernoulli  distribution  which  is  a special  case  of  the  binomial  distribution. 
Assuming  that  the  target  or  non-target  data  are  drawn  independently  from  this  distribution. 
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the  likelihood  function  {p{d\  x))  is  expressed  by 

(144) 

n 

The  goal  is  to  maximize  this  likelihood  function  given  input  data.  For  convenience, 
we  again  want  to  minimize  the  negative  logarithm  of  the  likelihood  function  which  can  be 
thought  as  a cost  function  (E)  for  the  network  optimization  [5]  [69]. 

E = { d{n)  lny{n)  + { \ - d{n))  ln{  \ - y{n) ) } (145) 

n 

This  error  function  is  called  the  cross-entropy  error  function  between  desired  responses 
(d)  and  the  posterior  probabilities  (y).  This  shows  that  the  cross  entropy  is  a natural  cost 
function  for  the  two  class  classification.  Now,  the  derivative  of  this  function  is 


_ y{n)  -d{n) 

dy{n)  y(n)  (1  -y(n) ) 


(146) 


Note  that  when  y (n)  = d (n)  for  all  « £ = 0.  With  the  interpretation  of  output  acti- 
vations having  probabilities,  we  want  them  to  range  between  0 and  1 . Therefore  a sigmoi- 
dal function  is  natural  and  the  posterior  probability  can  be  written  in  the  form  of  logistic 
function. 

P = — ^ (147) 

1+e  ^ 

We  see  that  the  derivative  of  E with  respect  to  e is 

dE 

^ = y(n)  -d(n)  (148) 

The  error  form  has  the  same  form  of  sum  of  squared  error  with  linear  output  units.  But  the 
outputs  here  represents  the  posterior  probabilities,  q can  be  a linear  discriminant  function 
or  a nonlinear  discriminant  function. 

For  desired  responses  having  1 for  target  class  and  0 for  non-target  class,  the  cross 
entropy  cost  function  can  be  written  in  the  following  form  [5] 
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E = = - ^ ln{l  - e (n) ) - ^ ln{  \ + e (n) ) (149) 

n G target  n e non  - target 

class  class 

where  e(n)  = d{n)  - y{n).  For  small  e(n),  the  cross  entropy  function  becomes 

E^Y^\e{n)\  (150) 

n 

This  has  the  same  form  of  Lj  norm.  Small  errors  are  more  weighed  by  the  cross  entropy 
function  than  Lp  norms  (p  > 2).  The  cross  entropy  cost  function  versus  error  is  plotted  in 
Figure  30.  The  small  errors  and  large  errors  are  penalized  more  by  the  cross  entropy  cost 
function  than  Lp  norms. 


e = d-y 


Figure  30  Cross  entropy  cost  function  versus  error. 

A BP  algorithm  can  be  used  for  training  MLPs  with  the  cross  entropy  cost  function. 
The  local  gradients  are  modified  in  the  output  layer  of  MLPs  and  given  by 
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CHAPTER  6 


EXPERIMENTS  AND  RESULTS 
6.1  Introduction 

This  chapter  experiments  the  focus  of  attention  proposed  for  ATR  and  evaluates  the 
system  performance  based  on  a millimeter  wave  SAR  imagery  (Mission  90  Pass  5 SAR 
data  set). 

In  Section  6.2,  the  two  prescreeners,  the  two-parameter  GEAR  detector  and  the 
TCFAR  detector,  are  evaluated  based  on  their  detection  performances  over  the  entire  imag- 
ery of  the  Mission  90  Pass  5 SAR  data  set.  Section  6.3  assesses  the  detection  performance 
of  the  QGD  and  the  NL-QGDs  applied  to  the  regions  of  interest  (ROIs)  passed  by  the  two- 
parameter  GEAR  detector.  The  results  of  discriminating  powers  of  the  QGD  and  NL- 
QGDs  are  compared  based  on  ROCs  over  the  false  alarms  and  targets  detected  by  the  two- 
parameter  GEAR  detector.  The  detection  performance  of  the  NL-QGDs  trained  based  on 
different  optimality  indices  are  presented  for  different  sizes  of  networks.  The  detection 
performances  of  the  QGD  and  NL-QGDs  are  presented  in  conjunction  with  the  yCFAR 
detector  in  Section  6.3.4. 


6.2  Prescreening  SAR  Imagery 
6.2.1  Two-Parameter  GEAR  Processing 

The  two-parameter  GEAR  detector  was  run  over  the  127  frames  (about  7 km^)  of  the 
mission  90  pass  5 data.  345  targets  from  the  TABILS  24  ISAR  data  base  were  embedded 
based  on  the  method  mentioned  in  Section  2.7.2.  The  size  of  the  GEAR  stencil  was  85  by 
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85  pixels  in  order  to  compare  with  the  CFAR  stencil  (84  by  84  pixels  for  1 ft  resolution 
PWF  SAR  data)  by  Novak  et  al.  [57].  The  local  mean  and  standard  deviation  are  computed 
in  the  outmost  4-pixel  wide  boundary  of  the  stencil.  The  intensity  of  the  test  pixel  is  com- 
puted by  averaging  3 by  3 pixels  in  the  center  of  the  stencil. 

After  the  CFAR  processing,  multiple  detection  points  occur  in  targets  and  other 
regions  because  targets  and  man-made  clutter  (and  tree  tops)  normally  consist  of  many 
high  reflectivity  pixels  that  trigger  the  prescreener  repeatedly  (raw  detections).  It  is  there- 
fore required  that  a clustering  process  over  the  multiple  detections  be  performed  as  a more 
representative  count  of  detections  and  false  alarms.  The  false  alarm  reduction  stage  and 
the  classification  stage  in  Figure  2 only  operate  on  the  clustered  locations.  The  size  of  the 
clustering  region  was  determined  by  the  size  of  targets  (in  this  case,  clustering  is  22-pixel 
long).  The  clustering  used  is  as  follows:  each  frame  (512  by  2048  pixels)  of  the  SAR  data 
is  processed  by  the  front-end  detection  stage  and  the  outputs  which  exceed  a threshold  in 
(51)  are  stored  by  magnitude;  the  clustering  starts  at  the  location  with  the  maximum  out- 
put and  groups  all  the  raw  detections  within  a representative  range  of  target  sizes;  next,  all 
those  grouped  locations  are  merged  at  a single  representative  location  based  upon  the 
weighted  sum  of  the  outputs  in  the  grouping.  The  number  of  false  alarms  is  computed 
from  the  clustered  detections. 

The  two-parameter  CFAR  detector  yielded  4,455  false  alarms  over  the  127  frames  of 
the  mission  90  pass  5 SAR  data  set  when  the  detection  threshold  was  set  at  100%  target 
detection  (all  345  targets).  Figure  31  displays  some  of  detection  and  clustering  results  by 
the  two-parameter  CFAR  detector.  The  detection  points  (false  positives)  in  the  centers  of 
the  subimages  (image  chips  having  85  by  85  sizes)  mostly  exhibit  high  contrasts  relative 
to  their  surroundings  because  the  two-parameter  CFAR  detector  depends  on  relative  inten- 
sity information. 
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Figure  31  Some  examples  of  non-target  image  chips  triggered  by  the  two-parameter 
CFAR  detector. 
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6.2.2  3CFAR  Processing 

6.2.2. 1 Optimal  Parameter  Search 

From  the  false  alarms  triggered  by  the  two-parameter  CFAR  detector  for  100%  target 
detection,  550  false  alarms  as  non-target  image  chips  were  randomly  selected  and  inte- 
grated with  the  set  of  345  target  image  chips  to  create  the  data  base  for  finding  optimal 
parameters  (p-j  and  P15)  of  the  7CFAR  detector.  After  the  clustering  process,  each  of  the 
targets  and  non-target  objects  from  detections  was  centered  in  the  image  chips  by  using  its 
highest  intensity  pixel.  Some  of  non-target  image  chips  are  shown  in  Figure  31 . 

The  parameter  space  of  the  pj  and  P15  was  incrementally  scanned  between  0.0304 
and  4.6052.  This  range  was  converted  from  the  range  (0  to  1)  in  which  gamma  kernels  in 
discrete  time  are  stable:  first,  the  range  (0  to  1)  was  equally  scanned  in  33  steps  for  stable 
gamma  kernels  in  discrete  time;  next,  it  was  converted  to  the  range  (0.03  to  4.6052)  by  the 
following  relationship  p,^  = -In  ( 1 - p^)  where  p^  are  values  of  the  scanned  range  (0  to 
1)  in  33  steps  and  p^  the  converted  values  for  continuous  time.  Kernels  with  33  different  p 
values  were  computed  which  correspond  to  memory  depth  of  about  0.5  to  33  pixels  for  the 
gj  kernel,  and  8 to  495  pixels  for  gi5  kernel. 

The  false  alarms  were  computed  in  the  follow  way:  First,  a target  output  of  the 
yCFAR  detector  (set  to  detect  all  the  targets)  is  found  and  set  to  be  a threshold  for  each  set 
of  p-i  and  p-15  and  then,  with  this  threshold,  we  computed  the  corresponding  false  alarm 
rate  in  the  parameter  space.  The  false  alarm  surface  in  the  pj  and  pj5  space  is  shown  in 
Figure  31a. 

After  searching  the  2-D  false  alarm  surface,  the  minimum  false  alarm  rate  was  found 
with  p,j  = 1.0788  (index  22,  memory  depth  of  1 pixel)  and  P15  = 0.5978  (index  15,  mem- 
ory depth  of  25  pixels).  This  combination  constitutes  a yCFAR  stencil  which  can  be 
thought  of  as  a local  feature  extractor  for  the  best  discrimination  between  the  target  and 
non-target  classes  in  the  training  image  chips.  The  minimum  number  of  false  alarms  pro- 
duced by  the  7CFAR  detector  was  105  in  the  training  image  chips.  With  the  optimal  values 
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of  (4|  and  (X|5,  the  guard  band  size  was  approximately  15-pixel  wide,  and  the  local  area 
was  approximated  within  a 10-pixel  wide  ring  band  (Figure  32b).  Note  that  the  yCFAR 
detector  was  able  to  improve  upon  the  CFAR  performance  which  produced  550  false 
alarms,  changing  the  memory  depths  of  the  2-D  gamma  kernels  (^i  and  ^15)  in  the  stencil. 

6.2.2. 2 Impact  of  Stencil  Size  in  False  Alarm 

The  false  alarm  surface  in  Figure  32a  illustrates  the  importance  of  the  stencil  for 
detection  performance.  What  is  important  to  note  is  the  dramatic  dependence  of  false 
alarm  rate  on  the  shapes  of  gj  and  ^15  kernels.  A small  difference  in  the  shapes  of  the  ker- 
nels makes  a big  difference  in  the  false  alarm  performance.  The  gamma  kernels  are  contin- 
uous functions  of  |x’s  so  that  the  gamma  kernels  can  be  differentiated  with  respect  to  the 
parameter  |i’s.  Hence  a productive  way  of  setting  the  scale  parameters,  |4j  and  1x15,  is  to 
use  adaptation  algorithms  from  adaptive  filter  theory. 

Deciding  on  the  shape  of  the  stencil  by  the  geometric  characteristic  of  targets,  as  done 
in  the  two-parameter  CFAR  stencil,  will  probably  give  suboptimal  performance.  Since  the 
2-D  gamma  kernels  are  circularly  symmetric,  a single  parameter  of  the  kernels  controls 
the  kernel  shapes  with  the  kernel  orders  fixed.  More  versatile  shapes  may  perform  better, 
but  we  have  to  be  prepared  for  the  explosion  in  the  degree  of  freedom  and  the  inherent  dif- 
ficulty of  setting  more  parameters. 

It  is  also  interesting  to  see  the  effect  of  changing  the  guard  area  size  and  the  size  of  a 
target  masking  kernel  in  the  CFAR  stencil.  In  order  to  fairly  compare  the  CFAR  stencil 
with  the  yCFAR  stencil,  circular  kernels  with  abrupt  changes  in  magnitude  were  used  (Fig- 
ure 33a).  The  size  of  the  clutter  masking  kernel  was  set  to  be  10-pixel  wide,  to  approxi- 
mate the  width  (about  10-pixel  wide)  of  the  optimal  g|5  kernel  (Figure  32b).  The  numbers 
of  false  alarms  were  computed  by  changing  r2  and  rj.  The  minimum  number  of  false 
alarms  was  360  which  was  obtained  at  rj  = 2 and  r2  = 27  (Figure  33b).  This  number  is 
about  3.5  times  the  minimum  number  (105  false  alarms)  of  false  alarms  by  the  yCFAR 
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detector.  This  implies  that  the  abrupt  shape  of  the  CFAR  stencil  may  cause  undesirable 
detection  performance. 
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b)  tCFAR  stencil  corresponding  at  the  optimal  Pi  and  P15. 


Figure  32  The  false  alarm  surface  of  7CFAR  detector  and  the  corresponding  stencil  at 
the  optimal  parameters. 
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Figure  33  Impact  of  CFAR  stencil  size  to  false  alarm:  a)  round-shape  CFAR  stencil 
with  abrupt  change  in  magnitude,  b)  false  alarms  versus  the  radius  rj  of  a target 
masking  kernel  and  the  radius  r2  of  a clutter  masking  kernel  from  the  pivot  point  in 
the  CFAR  stencil. 
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6.2.2.3  Batch-running  the  tCFAR  Detector 

With  the  optimal  set  of  p-j  and  p.  15,  the  7CFAR  detector  was  run  over  the  first  127 
frames  of  the  Mission  90  Pass  5 SAR  data  set  with  the  same  embedded  targets  as  for  the 
CFAR  detector.  The  size  of  the  7CFAR  stencil  was  85  by  85  pixels  as  for  the  CFAR  sten- 
cil. 

In  order  to  compute  the  7CFAR  output  at  each  pixel  in  the  image,  three  features  (1st 
moment  at  the  center  pixel  and  1st  and  2nd  moments  in  the  local  region)  are  required  to  be 
computed  by  convolving  the  image  with  two  gamma  kernels  (gj  and  ^).  The  con- 
volution was  computed  in  the  frequency  domain  using  FFTs  to  obtain  better  computa- 
tional efficiency.  The  image  sequence  was  divided  into  overlapping  radix  2 windows 
(2048  by  128  pixels)  and  processed  by  an  overlap  and  save  method.  After  processing  the 
entire  imagery  of  the  Mission  90  Pass  5 SAR  data  set  by  the  7CFAR  detector,  the  mini- 
mum output  value  was  selected  to  be  a threshold  for  100%  target  detection.  All  detection 
points  above  the  threshold  were  clustered  as  for  the  CFAR  case. 

6.2.3  Performance  Comparison  of  the  Two-Parameter  CFAR  Detector  and  the  "\C!FAR 

Detector. 

The  performances  of  the  two-parameter  CFAR  detector  and  the  yCFAR  detector  aie 
compared  by  ROC  curves  (Figure  34).  For  100%  detection,  the  7CFAR  detector  yielded 
760  false  alarms  while  the  CFAR  detector  produced  4,455  false  alarms  (a  1 :6  ratio).  Less 
false  alarms  created  by  the  prescreener  means  that  the  computational  bandwidth  of  the 
subsequent  processing  modules  can  be  decreased.  With  a discount  of  2%  target  outliers, 
the  yCFAR  and  the  CFAR  detectors  yielded  239  and  510  false  alarms  respectively  for  98% 
target  detection. 

Overall,  the  ROC  of  the  7CFAR  in  Figure  34  shows  more  robust  performance  than  the 
standard  two  parameter  CFAR  test.  Figure  35  shows  the  number  of  raw  detections  (false 
positives)  and  clustered  detections  per  frame.  The  yCFAR  detector  outperformed  the 
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detection  performance  of  the  two-parameter  CFAR  detector  in  all  frames,  i.e.  both  for  nat- 
ural and  cultural  clutter.  We  can  observe  that  the  number  of  false  positive  change  from 
frame  to  frame.  It  is  also  interesting  to  note  that  clustering  merges  the  raw  detections  into 
smaller  numbers  of  ROIs  in  each  frame.  The  overall  computation  requirements  by  the  two- 
parameter  CFAR  detector  is  larger  for  clustering  due  to  an  excessive  false  positives. 

Figure  37  and  Figure  38  display  the  performance  of  both  detectors  in  areas  having  dif- 
ferent statistical  characteristics,  i.e.  natural  clutter  area  from  frames  7 and  8 and  a cultural 
clutter  area  from  frames  121  and  122  (Figure  36).  In  the  cultural  clutter  region,  the  yCFAR 
detector  produced  59  false  alarms  the  CFAR  detector  yielded  1 13  false  alarms.  In  the  nat- 
ural clutter  region,  4 false  alarms  occurred  by  the  yCFAR  detector  and  28  false  alarms  by 
the  CFAR  detector.  Note  that  the  objects  in  the  boxes  in  Figure  36b  indicate  targets 
embedded. 
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Figure  34  Global  ROC  measurement  over  127  frames  of  the  Mission  90  pass  5 SAR 
data  set. 
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In  Figure  38,  multiple  detections  occurs  on  tops  of  the  targets  and  brighter  objects  of 
man-made  clutter  because  these  are  typically  high  contrast  relative  to  their  background 
clutter.  The  two-parameter  CFAR  detector  produced  considerable  amounts  of  detections 
along  the  roof  edges  of  the  houses  toward  the  SAR  (that  is,  upwards  the  figure).  Many 
detections  also  occur  on  tops  of  the  trees  in  natural  clutter  because  the  relative  contrasts 
between  trees  and  their  surrounding  radar  shadows  are  large  which  causes  the  CFAR 
detections. 

6.2.4  Conclusion 

Unlike  the  two-parameter  CFAR  detector  which  uses  an  a priori  defined  stencil,  the 
TCFAR  detector  utilizes  the  family  of  2-D  gamma  kernels  where  the  free  parameter  p 
changes  the  shape  and  scale  of  the  7CFAR  stencil.  This  work  shows  that  the  performance 
of  the  standard  two  parameter  CFAR  detector  can  be  improved  if  the  size  of  the  stencil  is 
optimized.  It  is  true  that  the  shape  of  the  7CFAR  is  graded  and  also  different  from  that  of 
the  CFAR  stencil,  but  one  of  the  most  important  factors  for  the  performance  improvement 
is  the  stencil  size.  The  optimal  value  of  p was  found  through  exhaustive  search  just  to 
quantify  what  is  the  best  possible  performance.  In  an  adaptive  signal  processing  frame- 
work, we  can  find  the  best  value  of  p through  training,  either  by  minimizing  false  alarms 
or  by  minimizing  an  output  error  measure.  This  has  been  done  already  for  the  1-D  case 
[64],  and  the  methodology  can  be  extended  to  the  2-D  case. 

The  big  challenge  is  to  find  an  on-line  performance  criterion  that  will  be  able  to 
decide  what  is  the  best  scale.  Further  research  should  also  look  into  the  use  of  nonsymmet- 
ric  kernels.  The  "yCFAR  can  be  interpreted  as  an  estimator  of  the  local  intensity  statistics. 
In  this  perspective  the  use  of  the  full  kernel,  instead  of  using  only  gj  and  gi5  motivated  by 
the  CFAR  stencil,  should  improve  the  performance  of  the  detector. 
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Figure  35  False  alarms  vs.  frame,  a)  The  number  of  raw  detections  (false  detections) 
was  counted  before  clustering,  and  b)  the  number  of  false  alarms  was  counted  to 
detect  345  targets  (testing  target  set)  after  clustering  through  127  frames  of  the  Mis- 
sion 90  Pass  5 SAR  data  set. 
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Figure  36  A natural  clutter  region  (a)  from  frames  121  and  122  and  cultural  clutter 
region  (b)  from  frames  7 and  8 in  the  mission  90  pass  5 data  set.  In  the  cultural  clut- 
ter, 13  targets  were  embedded  and  marked  by  rectangular  boxes. 
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Figure  37  The  performance  comparison  of  CFAR  and  7CFAR  detector.  The  CFAR 
detector  yielded  28  false  alarms  while  the  7CFAR  detector  triggered  only  4 false 
alarms. 
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b)  tCFAR  detector 

Figure  38  The  performance  comparison  of  CFAR  and  7CFAR  detector.  The  CFAR 
detector  yielded  113  false  alarms  while  the  yCFAR  detector  triggered  only  59  false 
alarms. 
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6.3  False  Alarm  Reduction 

6.3.1  CFAR/OGD 

6. 3. 1.1  Trainine  OGD 

6.3. 1.1.1  Training  Data  Preparation 

In  order  to  train  the  QGD,  275  embedded  target  image  chips  of  a training  set  were 
used  for  the  target  class  and  550  clutter  image  chips  were  randomly  selected  from  the 
4455  false  alarms  caused  by  the  two-parameter  CFAR  detector.  Before  the  target  and  clut- 
ter image  chips  were  collected,  the  clustering  was  performed  and  multiple  detections 
within  a target  size  (23  pixel  long)  were  clustered  into  a centroid. 

6. 3. 1.1. 2 Optimal  Weights  by  the  Closed  Form  Solution 

The  least  squares  method  is  utilized  to  optimally  compute  the  weights  of  the  QGD  in 
the  training  set.  The  weights  are  found  such  that  the  power  of  differences  between  the  sys- 
tem outputs  and  the  corresponding  desired  responses  are  minimized  (L2  norm). 

Since  the  size  of  the  training  set  (550  clutter  and  275  target  image  chips)  is  greater 
than  the  number  of  weights  (8  weights),  the  method  solves  an  overdetermined  system  of 
linear  equations  by  (85),  and  is  a parametric  least  squares  problem  because  the  least 
squares  solution  is  also  dependent  of  the  parameters,  pj  and  P15.  The  parameters  and 
weight  vector  of  the  QGD  can  be  obtained  through  an  exhaustive  search  in  the  parameter 
space.  Given  each  combination  of  Pi  and  pj5,  the  corresponding  least  squares  solution  is 
computed.  Optimal  parameters  and  weight  vectors  are  determined  in  which  the  number  of 
false  alarms  is  minimum  in  the  training  set. 

Figure  39a  shows  the  false  alarm  surface  in  the  parameter  space  p.  The  optimal  indi- 
ces of  the  parameters,  P]  and  P15,  are  8 and  16  which  correspond  to  0.274  and  0.654 
respectively.  Note  that  the  optimal  values  of  P]  are  much  smaller  than  those  of  pj  in  the 
TCFAR  detector,  which  means  that  the  shape  of  the  kernel  gj  of  the  QGD  is  much  broader. 
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Since,  in  the  training  set  of  the  QGD,  the  clustering  points  are  the  centers  of  mass  of  detec- 
tions, the  brightest  pixels  are  not  always  placed  at  the  centers  of  image  chips.  The  gj  ker- 
nel of  the  QGD  stretches  out  to  capture  the  bright  pixel  intensities  which  usually  help 
discriminate  targets  from  clutter.  So  gj  ends  up  being  a broader  shape  of  kernel  compared 
to  the  that  in  the  yCFAR  stencil. 

Another  distinguishing  feature  of  the  QGD  false  alarm  surface  is  that  it  shows  the 
small  numbers  of  false  alarms  at  several  different  location  in  the  parameter  surface  (local 
minima  in  the  false  alarms).  This  is  because,  in  addition  to  the  parameters  (p’s),  the  QGD 
has  8 weights  which  allow  for  more  discriminating  power  than  the  yCFAR  detector.  That 
is,  the  optimal  weights  counterbalance  improper  choices  of  the  parameters. 

The  SSEs  of  the  QGD  were  computed  in  the  parameter  space  as  for  the  false  alarms 
and  the  SSE  surface  is  shown  in  Figure  39b.  Compared  to  the  false  alarm  surface,  the  SSE 
surface  is  much  smoother  and  the  optimal  indices  of  P]  and  pj5  yielding  the  least  SSE  are 
4 and  16  which  correspond  0. 128  and  0.654.  The  optimal  P15  in  the  false  alarm  surface  is 
the  same  as  the  optimal  P15  in  the  SSE  surface  but  the  least  SSE  is  found  at  a smaller  value 
of  the  pj,  compared  to  the  minimum  false  alarm. 

However,  the  two  optimal  points  in  both  the  SSE  and  the  false  alarm  surfaces  are 
close  to  each  other  in  the  parameter  space.  This  implies  that,  as  for  the  case  of  the  LS 
method  with  an  exhaustive  search  of  the  parameter  space  (vvi,...,  wg),  the  QGD  may  be 
able  to  obtain  a comparable  false  alarm  performance  from  an  gradient  descent  method  by 
adaptively  computing  the  free  parameters  (pj,  P15,  wg)  when  the  SSE  approaches 
nearby  the  global  minimum  point  in  the  10-dimensional  parameter  space  (pj,  P15,  wj,..., 
Wg)  during  the  training. 


Table  1 Detection  performance  of  QGD  in  the  training  set. 


Quadratic  Gamma  Detector  (QGD) 

Target  detection  rates 

100% 

99% 

98% 

95% 

92% 

SSE 

# of  false  alarms 

14 

14 

8 

4 

3 

0.0273 
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Figure  39  Performance  surfaces  of  the  QGD  in  parameter  space,  a)  False  alarm  sur- 
face, b)  SSE  surface  with  optimal  weights. 
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ROC  curve:  QGD 


QGD  outputs  for  the  clutter/target  image  chips 
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Figure  40  The  discriminating  performance  of  the  QGD  in  the  training  set.  a)  ROC 
curves,  b)  QGD  outputs  for  the  inputs  of  clutter  and  target  image  chips  after  training. 
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In  order  to  detect  all  targets  (275  targets)  in  the  training  set,  the  QGD  yielded  14  false 
alarms  at  100%  and  99%  detection  rates,  and  for  98%,  95%  and  92%  target  detections,  8, 
4,  and  3 false  alarms  occurred  respectively  in  Table  1.  As  shown  in  Figure  40b,  clutter  and 
target  outputs  were  well  separated. 

6. 3 .1.1 .3  Testing  the  QGD 

Given  a set  of  parameters,  the  QGD  training  seeks  optimal  weights  based  on  minimiz- 
ing the  sum  of  squared  errors  between  the  outputs  and  their  corresponding  desired 
responses.  The  optimal  set  of  parameters  (P|  and  Pi5)  are  determined  which  leads  to  the 
minimum  number  of  false  alarms  in  the  training  set.  With  the  optimal  set  of  parameters 
and  optimal  weights,  the  QGD  was  tested  on  different  sets  of  clutter  and  target  image  chip 
embeddings.  Table  2 shows  the  performance  of  the  QGD  in  the  testing  phase  based  on 
SSE  and  false  alarms  given  detection  rates. 

The  QGD  exhibits  a discrimination  power  of  reducing  3905  false  alarms  to  385,  1 18, 
97,  53  and  42  false  alarms  for  100%,  99%,  98%,  95%,  and  92%  detection  rates  respec- 
tively. For  all  clutter  image  chips  (4455  false  alarms  caused  by  the  two-parameter  CFAR 
detector),  the  false  alarms  were  reduced  to  422,  132,  109,  57,  and  44  at  100%,  99%,  98%, 
95%,  and  92%  detection  rates  respectively  by  the  QGD.  A discrimination  power  of  about 
l.TO  ratio  (422/4455)  was  obtained  by  the  QGD  at  100%  target  detections  in  the  false 
alarm  reduction  stage  in  the  ATD/R  system  (Figure  2). 

6. 3. 1.2  Training  the  QGD  in  an  Iterative  Manner 

An  optimal  weight  set  of  the  QGD  was  computed  based  on  the  LS  given  pj  and  P15  in 
the  previous  section.  The  optimal  parameter  set  (pj  and  P15)  was  selected  based  on  the 
minimum  false  alarm  for  100%  detection  after  an  exhaustive  search  of  the  2-D  parameter 
space. 

Here  the  weights  and  the  parameters  (pi  and  P15)  were  adaptively  computed  in  an 
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a)  ROC  curve 


b)  QGD  outputs  for  the  clutter/target  image  chips 
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Figure  41  Testing  results  of  the  QGD  for  testing  set  #1 . 1 . a)  ROC  curve,  b)  outputs  of 
the  QGD. 
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a)  ROC  curve 


b)  QGD  outputs  for  the  clutter/target  image  chips 
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Figure  42  Testing  results  of  the  QGD  for  testing  set  #2.  a)  ROC  curves,  b)  outputs  of 
the  QGD. 


121 


Table  2 Detection  performance  of  QGD  in  testing. 


Quadratic  Gamma  Detector  (QGD) 

Target  detection  rates 

100% 

99% 

98% 

95% 

92% 

SSE 

No.  of  false  alarms  in  test- 

385/ 

118/ 

97/ 

53/ 

42/ 

0.0202/ 

ing  set  #1.1  /No.  of  false 
alarms  in  testing  set  #2 

422 

132 

109 

57 

44 

0.0192 

Note  that  the  testing  set  #1 . 1 contains  3905  clutter  image  chips  (the  image  chips 
except  550  training  clutter  image  chips  out  of  4455  false  alarms  by  the  two-parameter 
CFAR  detector)  and  345  target  image  chips  embedded  over  127  frames  (about  7 kni^)  of 
the  mission  90  pass  5 data  set  for  testing  purpose.  The  testing  set  #2  includes  all  the  clutter 
image  chips  (4455)  and  345  target  image  chips  embedded  for  testing. 


iterative  manner  by  a gradient  descent  method  during  the  training.  At  each  iteration,  the 
learning  and  momentum  rates  were  changed  to  accelerate  the  convergence  speed  of  the 
weight  and  parameter  variables,  that  is,  when  the  current  error  is  smaller  that  the  error  at 
the  previous  iteration  the  learning  and  momentum  rates  were  increased  by  small  amounts 
and  otherwise  decreased  to  70%  of  the  values  of  the  learning  and  momentum  rates  at  the 
previous  iteration.  In  the  QGD  feature  set,  the  intensity  mean  values  is  much  smaller  than 
the  intensity  squared  mean.  This  causes  a wide  spread  in  optimal  weight  values.  So  the 
weight  and  )i  adaptation  were  performed  based  on  a whitened  feature  set  at  each  iteration. 
The  training  was  stopped  when  the  change  in  the  number  of  false  alarms  for  100%  detec- 
tion was  less  than  two  false  alarms  for  3500  iterations. 

Figure  43a  and  Figure  43b  show  the  learning  curve  and  false  alarm  curve  of  the  QGD. 
The  number  of  false  alarms  for  100%  detection  decreases  as  the  error  decreases  during  the 
training. 
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after  training  the  QGD,  d)  Outputs  of  the  QGD  from  the  training  set  after  training  the 
QGD,  e)  ROC  curve  from  the  testing  set,  and  f)  Adaptation  of  the  parameters  (|ij  and 
Hi5)- 
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iterations  iterations 


iterations  iterations 


Figure  44  Adaptation  of  weights 
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In  the  LS  solution,  the  optimal  set  of  parameters  were  chosen  based  on  the  number  of 
minimum  false  alarms.  In  this  iterative  method,  the  parameters  were  adaptively  computed 
in  the  sense  of  minimizing  the  SSE.  In  Figure  43f  and  Figure  44,  the  parameter  (1x15)  and 
the  weights  (W2  and  Wg)  didn’t  seem  to  converge  to  certain  values.  Since  the  training  was 
stopped  based  on  the  change  in  the  false  alarms  for  100%  target  detection,  the  outputs  of 
the  QGD  after  the  training  was  stopped  were  well  separated  (Figure  43d).  Table  3 demon- 
strates the  detection  performance  of  the  QGD  trained  in  the  iterative  manner. 


Table  3 Detection  performance  of  the  adaptively  trained  QGD  in  the  training  set. 


Quadratic  Gamma  Detector  (QGD) 

Target  detection  rates 

100% 

99% 

98% 

95% 

92% 

SSE 

# of  false  alarms 

35 

14 

8 

6 

4 

0.0196 

The  detection  performance  of  the  iteratively  trained  QGD  is  inferior  to  the  QGD 
trained  on  the  LS  at  probabilities  of  detection  (100%  ~ 92%)  in  the  training  set. 


Table  4 Detection  performance  of  the  adaptively  trained  QGD  in  the  testing  set. 


Quadratic  Gamma  Detector  (QGD) 

Target  detection  rates 

100% 

99% 

98% 

95% 

92% 

SSE 

# of  false  alarms 

598/ 

142/ 

106/ 

48/ 

36/ 

0.0281/ 

695 

162 

122 

55 

42 

0.0273 

In  the  testing  set,  as  in  the  case  of  the  training  set,  the  iteratively  trained  QGD  yielded 
more  false  alarms  at  high  probabilities  (100%  ~ 98%)  of  detection  (Table  4).  Below  about 
95%,  the  detection  rate  is  higher  than  that  of  the  QGD  trained  on  the  LS. 
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6.3. 1.3  Independent  Testine  Results  for  the  OGD 

A more  complete  study  of  the  QGD  was  independently  conducted  at  MIT  Lincoln 
Laboratory  using  a large  SAR  database  of  real  military  targets  and  ground  clutter.  The 
objective  of  this  study  was  to  evaluate  the  performance  of  the  QGD  in  aiding  the  detection 
stage  of  the  SAR  ATD/R  system.  The  data  for  this  first  test  case  consisted  of  high  resolu- 
tion (1^  X Ift)  fully  polarimetric  SAR  imagery  preprocessed  using  the  PWF. 

The  QGD  was  trained  on  two  target  types  using  spotlight  target  data  and  also  man- 
made discretes  from  stripmap  clutter  data.  A total  of  135  target  image  chips  were  chosen 
for  training;  these  were  5 degrees  apart  in  aspect  angle  (i.e.,  5,  10,  15  degrees,  etc.).  The 
clutter  data  used  for  training  consisted  of  100  typical  man-made  discretes.  Evaluation  of 
this  test  case  was  performed  using  spotlight  target  and  stripmap  clutter  data.  As  in  the 
training  stage,  spotlight  data  of  two  targets  that  were  5 degrees  apart  in  aspect  angle  (i.e., 
3,  8,  13,  18  degrees,  etc.)  were  used  for  testing.  The  test  clutter  data  consisted  of  4727 
stripmap  clutter  image  chips  extracted  from  a total  of  56  knP'  in  area.  Thus,  the  test  data  set 
for  this  experiment  was  composed  of  139  target  image  chips  and  4727  clutter  image  chips. 

The  QGD  was  evaluated  by  running  the  data  through  the  two-parameter  GEAR  detec- 
tor first  (i.e.  only  over  the  image  chips  that  triggered  by  the  two-parameter  GEAR  detec- 
tor). Then,  the  ROC  curves  were  obtained  by  computing  the  cumulative  number  of  false 
alarms  out  of  each  detector.  At  a probability  of  detection  of  1 .0  {P^  = 1 .0),  the  GEAR  algo- 
rithm detected  139  targets  and  had  2499  false  alarms,  whereas  the  QGD,  while  also  detect- 
ing 139  targets,  reduced  the  above-mentioned  false  alarm  number  to  7 1 5 (Figure  45). 

The  second  test  case  study  used  single  channel  (HH)  stripmap  imagery  with  a resolu- 
tion of  Im  X Im.  The  training  set  for  the  QGD  consisted  of  52  target  image  chips  and  150 
clutter  image  chips  that  represented  two  types  of  targets  and  man  made  clutter.  The  evalu- 
ation of  the  QGD  was  performed  using  75  target  image  chips  and  44599  clutter  image 
chips,  which  triggered  the  two  parameter  GEAR  detector  when  analyzing  a 23 1 krr?'  of 


area. 
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False  alarms  per  km 


Figure  45  Discriminant  performance  of  the  QGD  versus  CFAR. 
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Figure  46  Comparison  of  the  two  parameter  CFAR  and  the  QGD  for  1 polarization,  1 
meter  data. 
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Figure  46  shows  the  results.  At  = 100%,  the  two  parameter  CFAR  detector  had  39,709 
false  alarms,  while  the  QGD  also  detected  all  75  targets  and  had  only  19,037  false  alarms. 

These  two  experiments  constitute  a comprehensive  test  of  the  performance  of  the 
QGD.  As  we  expected  the  generalization  capability  of  the  QGD  is  very  good,  which 
should  not  be  surprising,  since  the  QGD  has  only  8 free  parameters.  Consequently,  in 
practice  the  QGD  can  be  trained  for  robust  performance.  The  better  discriminant  charac- 
teristics of  the  QGD  can  be  due  to  the  better  estimates  for  the  mean  and  standard  deviation 
for  targets  and  clutter,  or  due  to  the  larger  feature  space. 

6. 3. 1.4  Conclusion 

The  appeal  of  the  QGD  is  that  the  scale  of  the  regions  where  the  statistics  are  esti- 
mated can  be  adapted  during  training  with  the  error  at  the  output  of  the  detector.  The 
implementation  that  we  chose  uses  only  a subset  of  the  gamma  kernels  and  ^15).  It  was 
derived  by  analogy  with  the  two  parameter  CFAR  detector  to  enable  a straight  forward 
comparison  with  this  widely  used  algorithm.  In  the  QGD,  the  estimates  of  the  local  statis- 
tics are  obtained  by  convolution  with  the  g\  (cell  under  test)  and  ^15  kernels  (local  neigh- 
borhood). The  QGD  is  superior  to  the  conventional  two  parameter  CFAR  in  terms  of 
detection  performances.  The  Lincoln  Laboratory  testing  also  shows  that  the  QGD  can 
improve  the  false  alarm  rate  of  the  two  parameter  CFAR  detector  without  affecting  the 
probability  of  detection. 

6.3.2  CFAR/NL-OGDs 

6.3.2. 1 Lo  Based  Training  and  Testing  of  the  NL-QGDs 

In  order  to  compare  the  performance  of  the  QGD  and  NL-QGD  based  on  the  same 
feature  values,  the  same  optimal  parameters  for  the  QGD,  pi  and  P15,  were  used  for  the 
NL-QGD.  Since  the  QGD  features  are  highly  correlated  one  another  it  is  desirable  to 
whiten  the  training  data  set  for  the  adaptation  of  weights  in  the  NL-QGD  training  to 
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achieve  fast  convergence  speed.  The  following  matrix  indicates  the  correlation  coefficient 
matrix  of  the  QGD  training  set.  As  seen  in  the  matrix,  the  first  column  vector  shows  high 
correlation  among  the  values  of  the  1st,  3rd  5th  and  7th  rows  ((gj  • X) , (gj  • X^), 

(gj  • X)  (gj  • X)  • X)),  the  second  column  exhibits  high  correlation  among  the 
values  of  the  2nd,  4th,  6th  and  7th  rows  ((gjj  • X) , • X^) , • X)  \ 

(<?i5  *^))and  soon. 


1.0000 

0.7733 

0.9946 

0.7617 

0.9971 

0.7677 

0.9558 

0.7733 

1.0000 

0.7287 

0.9959 

0.7478 

0.9961 

0.9192 

0.9946 

0.7287 

1.0000 

0.7205 

0.9991 

0.7276 

0.9372 

0.7617 

0.9959 

0.7205 

1.0000 

0.7401 

0.9995 

0.9163 

0.9971 

0.7478 

0.9991 

0.7401 

1.0000 

0.7468 

0.9472 

0.7677 

0.9961 

0.7276 

0.9995 

0.7468 

1.0000 

0.9204 

0.9558 

0.9192 

0.9372 

0.9163 

0.9472 

0.9204 

1.0000 

By  using  the  whitening  transformation  [27],  the  training  data  set  was  orthogonalized 
and  the  resulting  correlation  coefficient  matrix  becomes  an  identity  matrix  as  follows; 
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Backpropagation  with  a momentum  term  was  used  to  train  the  NL-QGD  weights  with 
the  whitened  training  data  set.  The  cross  validation  set  was  also  whitened  by  using  the 
estimate  of  the  covariance  matrix  of  the  training  data  set.  Training  was  stopped  when  the 
error  in  the  cross-validation  set  started  to  increase. 

The  learning  and  false  alarm  curves  of  the  NL-QGD  with  5 hidden  nodes  were  plotted 
in  Figure  47.  With  the  L2  normed  training,  it  is  observed  in  Figure  47b  that  the  number  of 
false  alarms  for  100%  target  detection  in  the  cross  validation  and  training  sets  decreased  as 
the  SSE  decreased  for  about  first  200  training  epochs.  However,  the  false  alarms  increase 
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between  about  the  200th  and  1000th  epoch  and  show  saturation  after  about  the  1000th 
epoch.  This  implies  that  the  outliers  of  the  target  class  produce  very  low  output  values 
after  the  training  and  the  L2  norm  may  not  be  a good  criterion  for  training  a detector.  So 
the  target  outliers  should  be  prevented  from  producing  very  low  output  values. 

Figure  48  shows  the  ROC  plots  and  outputs  of  the  QGD  and  NL-QGD  after  the  train- 
ing. Figure  49  and  Figure  50  plot  the  ROC  curves  and  the  outputs  of  the  NL-QGD751  in 
the  testing  sets. 

The  performance  of  the  QGD  and  NL-QGD  are  compared  for  different  network  sizes 
and  the  SSE  and  the  number  of  false  alarms  are  presented  for  both  systems  in  Table  5 and 
Table  6.  In  the  training  set,  the  ROC’s  of  the  QGD  and  the  NL-QGD  are  comparable.  In 
the  testing  sets,  the  NL-QGD  outperforms  that  of  the  QGD  with  respect  to  detection  prob- 
ability at  most  detection  rates.  The  NL-QGD  is  able  to  provide  a smaller  final  SSE  than  the 
QGD.  However,  the  performance  of  a detector  is  not  measured  in  terms  of  SSE  but  num- 
ber of  false  alarms  for  a given  detection  accuracy.  Hence,  the  ROC  curve  of  the  two  sys- 
tems must  be  compared.  The  comparisons  were  restricted  to  one  hidden  layer  networks, 
and  the  hidden  layer  size  was  increased  from  3 to  7 nodes.  The  NL-QGD  did  not  perform 
as  well  as  the  QGD  at  the  100%  detection  rate.  The  detector  output  of  the  NL-QGD  tended 
to  provide  large  misclassifications  which  affected  the  selection  of  the  threshold.  However, 
at  99%,  98%  and  95%  detection  probability  the  NL-QGD  outperformed  the  QGD.  Note 
that  changing  the  number  of  hidden  nodes  from  3 to  7 did  not  seem  to  affect  the  perfor- 
mance of  the  NL-QGD  much. 
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Learning  and  generalization  curves 


Figure  47  Learning  and  False  alarm  curves  of  the  NL-QGD751  trained  based  on  L2 
norm. 
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ROC  curves 


Figure  48  Training  results  of  the  L2  normed  NL-QGD75 1 . a)  ROC  curves,  b)  outputs 
of  the  NL-QGD751  for  the  training  set. 
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ROC;  NL-QGD751 


Figure  49  Testing  results  of  the  L2  normed  NL-QGD75 1 for  testing  set  #1 . 1 . a)  ROC 
curves,  b)  outputs  of  the  NL-QGD751. 
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ROC  curves 


Figure  50  Testing  results  of  the  L2  normed  NL-QGD751  for  testing  set  #2.  a)  ROC 
curves,  b)  outputs  of  the  NL-QGD751. 
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Table  5 Detection  performance  of  NL-QGD  in  training  set. 


Network 

Topologies 

Detection  Rates 

SSE 

Stop 

iterations 

100% 

99% 

98% 

95% 

92% 

NL-QGD731 

80 

6 

3 

3 

2 

0.005039 

4156 

NL-QGD751 

80 

6 

3 

3 

2 

0.005038 

4139 

NL-QGD771 

80 

6 

3 

3 

2 

0.005048 

4117 

The  NL-QGD731  indicates  that  the  network  has  7 input  nodes,  3 nodes  in  the  hid- 
den layer,  and  1 output  node.  For  NL-QGD  training,  the  learning  rate  (t))  and 
momentum  rate  (a)  were  0.1  and  0.02  respectively  for  all  the  networks. 


Table  6 Detection  performance  of  NL-QGD  in  testing. 


Network 

Topologies 

Detection  Rates 

(No.  of  false  alarms  in  testing  set  #1.2/ 
No.  of  false  alarms  in  testing  set  #2) 

SSE 

100% 

99% 

98% 

95% 

92% 

QGDls 

340/ 

105/ 

85/ 

45/ 

35/ 

0.0204/ 

422 

132 

109 

57 

44 

0.0192 

QGDadapt 

514/ 

115/ 

86/ 

41/ 

32/ 

0.0292/ 

695 

162 

122 

55 

42 

0.0273 

NL-QGD731 

1456/ 

96/ 

67/ 

35/ 

30/ 

0.006263/ 

1892 

120 

83 

46 

37 

0.005848 

NL-QGD751 

1453/ 

96/ 

67/ 

35/ 

30/ 

0.006261/ 

1889 

120 

83 

46 

37 

0.005846 

NL-QGD771 

1445/ 

97/ 

67/ 

35/ 

30/ 

0.006262/ 

1880 

120 

83 

46 

37 

0.005847 

Note  that  the  testing  set  #1.2  contains  3355  clutter  image  chips  (the  image  chips 
except  550  training  and  550  cross  validation  clutter  image  chips  out  of  4455  false 
alarms  by  the  two-parameter  CFAR  detector)  and  345  target  image  chips  embedded 
over  127  frames  (about  7 kni^)  of  the  Mission  90  Pass  5 SAR  data  set  for  testing  pur- 
pose. 
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The  NL-QGD  performed  better  than  the  QGD  in  most  of  detection  probabilities. 
However,  at  100%  detection  its  performance  is  inferior  to  the  QGD.  We  can  observe  that 
the  NL-QGD  did  a much  better  job  in  terms  of  SSE  by  analyzing  the  detection  outputs 
(Figure  48b).  When  it  gave  the  wrong  output,  the  error  was  much  larger  than  the  QGD’s. 
We  can  expect  this  from  the  nonlinear  nature  of  the  detector.  In  terms  of  number  of  false 
alarms,  this  behavior  affects  the  performance  because  the  threshold  for  100%  detection 
has  to  be  set  based  on  the  smallest  value  obtained  from  the  target  image  chips. 

6. 3. 2. 2 Training  and  Testine  of  the  NL-OGDs  Without  Non-Target  Outliers 

In  the  L2  normed  training,  it  was  observed  that  the  NL-QGDs  produced  very  low  out- 
put values  for  the  target  outliers  and  high  values  for  the  non-target  outliers.  By  removing 
the  non-target  outliers  during  the  training,  the  decision  boundaries  of  the  NL-QGDs  move 
toward  the  target  outliers  so  that  the  output  values  of  the  NL-QGDs  for  the  non-target  out- 
liers increase.  This  leads  to  a threshold  for  a high  probability  of  target  detection  to 
increase. 

The  training  procedure  for  the  NL-QGDs  are  as  follows:  first,  the  NL-QGDs  are 
trained  based  on  L2  norm  and  the  number  of  false  alarms  is  computed  at  each  iteration 
during  the  training,  second,  when  the  number  of  false  alarms  in  the  cross  validation  set 
starts  increasing,  the  non-target  sample  which  produced  the  maximum  output  is  removed 
from  the  training  set;  third,  this  training  procedure  proceeds  until  the  number  of  false 
alarms  does  not  change  for  250  iterations. 

Figure  5 1 shows  the  ROC  curve  and  the  outputs  of  the  NL-QGDs  after  the  trained  was 
stopped.  As  expected,  the  output  values  from  the  target  outliers  were  significantly 
increased  with  high  output  values  from  the  non-target  outliers.  This  is  because  the  deci- 
sion boundary  moved  toward  the  target  outliers  so  that  the  distances  between  the  non-tar- 
get  outliers  and  the  decision  boundary  became  larger. 
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ROC  curves 


Figure  51  Training  results  of  the  NL-QGD751  trained  without  non-target  outliers,  a) 
ROC,  b)  outputs  of  the  NL-QGD751  for  the  training  set. 
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ROC  curves 


Figure  52  Testing  results  of  the  NL-QGD751  trained  without  non-target  outliers  for 
the  testing  set  #1 .2.  a)  ROC  curves,  b)  outputs  of  the  NL-QGD751 . 
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ROC  curves 


Figure  53  Testing  results  of  the  NL-QGD751  trained  without  non-target  outliers  for 
the  testing  set  #2.  a)  ROC  curves,  b)  Outputs  of  the  NL-QGD751. 
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Table  7 summarizes  the  training  results  of  the  NL-QGDs  trained  on  L2  norm  with 
removal  of  non-target  outliers  during  training  for  different  network  sizes  (3,  5 and  7 nodes 
in  the  hidden  layer).  During  training,  the  numbers  of  non-target  outliers  from  the  training 
set  were  8,  9,  and  6 respectively.  Note  that  the  stop  iterations  for  all  network  sizes  are 
extremely  small  compared  to  the  case  without  removal  of  non-target  outliers.  Figure  52 
and  Figure  53  show  the  ROC  curves  and  the  outputs  of  the  QGD751  from  the  two  testing 
sets.  The  minimum  target  output  value  is  much  higher  than  that  from  the  NL-QGD751 
trained  on  L2  without  removal  of  non-target  outliers. 


Table  7 Detection  performance  of  the  NL-QGDs  trained  without  non-target  outliers  . 

(training  set) 


Network 

Topologies 

Detection  Rates 

SSE 

#of 

target 

outliers 

removed 

Stop 

iterations 

100 

% 

99% 

98% 

95% 

92% 

NL-QGD731 

13 

13 

8 

4 

3 

0.00648 

8 

567 

NL-QGD751 

14 

13 

9 

6 

4 

0.00618 

9 

342 

NL-QGD771 

14 

12 

9 

5 

5 

0.00630 

6 

426 

For  NL-QGD  training,  the  learning  rate  (ri)  and  momentum  rate  (a)  were  0. 196  and 
0.02  respectively  for  all  the  networks. 

In  the  testing  sets,  some  of  the  non-target  outputs  are  very  large  due  to  the  boundary 
shift  towards  the  target  outliers.  The  detection  performance  for  the  NL-QGDs  in  the  test- 
ing sets  are  summarized  in  Table  8.  The  number  of  false  alarms  at  100%  detection  rate  has 
significantly  reduced  compared  to  the  L2  case  without  removal  of  non-target  outliers  for 
all  network  sizes  (3,  5,  7 nodes).  The  detection  performance  at  99%  detection  rate  is 
slightly  better  than  the  L2  case  without  removal  of  non-target  outliers.  Below  98%  detec- 
tion rate,  the  performance  became  worse  because  the  training  with  removal  of  non-target 
outliers  led  the  NL-QGDs  to  lose  their  generalization  capability. 
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Table  8 Detection  performance  of  the  NL-QGD  trained  without  non-target  outliers. 

(testing  sets) 


Network 

Topologies 

Detection  Rates 

(No.  of  false  alarms  in  testing  set  #1.2/No. 
of  false  alarms  in  testing  set  #2) 

SSE 

100% 

99% 

98% 

95% 

92% 

NL-QGD731 

293/ 

103/ 

74/ 

45/ 

35/ 

0.01047/ 

365 

130 

96 

58 

45 

0.01030 

NL-QGD751 

250/ 

97/ 

77/ 

49/ 

37/ 

0.01768/ 

315 

125 

101 

63 

49 

0.01741 

NL-QGD771 

246/ 

98/ 

74/ 

48/ 

37/ 

0.01624/ 

311 

126 

97 

61 

48 

0.01596 

63.2.3  Lp  Based  Training  and  Testing  of  the  NL-QGD 

As  observed  in  the  NL-QGD  training  based  on  L2,  some  of  target  outputs  of  the  NL- 
QGDs  are  very  low.  Since  the  smallest  value  of  the  NL-QGD  output  is  set  to  be  thresh- 
olded  for  100%  detection,  low  target  outputs  indeed  cause  considerable  amounts  of  false 
detections  at  high  detection  probability. 

A way  of  alleviating  large  deviations  of  the  NL-QGD  outputs  from  their  desired  val- 
ues was  to  propose  to  using  a larger  norm  (p  > 2)  than  the  L2  norm.  By  choosing  p = 8 as  a 
large  norm,  a BP  algorithm  with  momentum  was  used  to  train  the  NL-QGDs.  The  Lg  norm 
is  an  appropriately  large  norm  in  penalizing  large  errors.  The  same  set  for  training  and 
cross  validation  was  used  to  training  the  NL-QGDs.  The  training  was  stopped  when  the 
cost  in  the  cross  validation  set  started  increasing. 

Figure  54  shows  the  learning  and  generalization  curves,  and  false  alarm  curves  of  the 
NL-QGD751  in  the  training  and  the  cross  validation  sets.  As  opposed  to  the  case  in  the  L2 
norm  based  training,  the  false  alarm  curves  decreased  as  the  training  epochs  increased 
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with  a period  of  false  alarm  increment  between  about  the  200  th  and  1000  th  epoch.  The 
false  alarm  decrement  is  somewhat  in  agreement  with  a minimizing  process  of  the  costing 
during  training. 

Table  7 illustrates  the  detection  performance  of  the  NL-QGD  with  3,5,  and  7 nodes  at 
100%,  99%,  98%,  95%,  and  92%  detection  rates.  The  Lg  normed  NL-QGDs  yielded 
higher  SSEs  than  the  QGD  and  the  L2  normed  NL-QGDs  after  training.  However,  false 
alarms  less  occurred  for  all  NL-QGDs  with  Lg  norm  than  for  those  of  the  QGD  trained 
based  the  least  squares.  This  implies  that  the  large  errors  were  weighted  more  than  in  the 
L2  case  and  false  alarms  were  therefore  greatly  improved.  In  addition,  the  SSE  criterion  is 
not  the  best  choice  from  a detection  stand  point.  A threshold  for  100%  detection  was 
increased,  thus  avoiding  an  excessive  amounts  of  false  alarms  due  to  a low  threshold.  The 
NL-QGDs  with  a large  norm  also  show  more  robust  performance  than  the  NL-QGD  with 
L2  and  QGD  in  the  ROC  curves  in  Figure  55a,  Figure  56a,  and  Figure  57a. 

During  the  training,  the  effective  learning  rate  at  the  output  layer  and  hidden  layer 
[74]  is  modified  to  rj  (n)  = je  (n)  where  ff  is  a learning  rate  at  p = 2 and  e{n)  is 

the  error  at  an  iteration  n at  the  output  layer.  When  the  errors  becomes  smaller  as  the  itera- 
tion number  increases,  the  convergence  speed  of  the  NL-QGDs  base  on  a large  norm  is 
slow  due  to  the  p-2  powered  pre-multiplication  factor.  Training  iterations  of  the  NL-QGDs 
based  on  a Lg  norm  are  about  10  times  larger  compared  to  the  QGD  for  3,  5,  and  7 nodes 
(Table  7). 

Table  8 shows  the  Lg  normed  NL-QGD  testing  over  the  same  testing  sets  as  for  the 
QGD  and  L2  normed  NL-QGD  cases.  The  ROC  curves  and  the  outputs  of  the  Lg  normed 
NL-QGDs  are  plotted  in  Figure  56  and  Figure  57.  The  large  errors  for  the  two  classes 
were  reduced  due  to  the  larger  norm  (Lg). 

The  NL-QGDs  with  3,  5,  and  7 nodes  yielded  about  200  more  false  alarms  than  the 
QGD  at  100%  detection  but  less  false  alarms  below  the  100%  detection  than  the  QGD. 
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Learning  and  generalization  curves 


Figure  54  Learning  curve  and  False  alarm  curve  for  100%  traget  detection  of  the  Lg 
normed  NL-QGD751. 
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ROC  curves 
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Figure  55  Training  results  of  the  L3  normed  NL-QGD751.  a)  ROC,  b)  outputs  of  the 
Lg  NL-QGD751  for  the  training  set. 
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ROC  curves 


Figure  56  Testing  results  of  the  Lg  normed  NL-QGD751  with  3,  5,  7 nodes  for  the 
testing  set  #1.2.  a)  ROC  curves,  b)  outputs  of  the  Lg  normed  NL-QGD751 . 
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Figure  57  Testing  results  of  the  NL-QGD751  with  3,  5,  7 nodes  for  the  testing  set  #2. 
a)  ROC  curves,  b)  Outputs  of  the  NL-QGD751  with  3,  5,  7 nodes. 
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Table  9 Detection  performance  of  Lg  normed  NL-QGD  in  training  set. 


Network 

Topologies 

Detection  Rates 

SSE 

Costs** 

Stop 

iterations 

100 

% 

99% 

98% 

95% 

92% 

NL-QGD731 

10 

10 

3 

2 

2 

0.02168 

2.550e-5 

43144 

NL-QGD751 

12 

10 

5 

3 

1 

0.02171 

2.559e-5 

48634 

NL-QGD771 

11 

10 

3 

3 

2 

0.02221 

2.590e-5 

41819 

For  NL-QGD  training,  the  learning  rate  (ri)  and  momentum  rate  (a)  were  0.9  and 
0.7  respectively  for  all  the  networks. 


Table  10  Detection  performance  of  Lg  normed  NL-QGD  in  testing. 


Network 

Topologies 

Detection  Rates 

(No.  of  false  alarms  in  testing  set  #1 .2/No. 
of  false  alarms  in  testing  set  #2) 

SSE 

Costs 

100% 

99% 

98% 

95% 

92% 

NL-QGD731 

522/ 

95/ 

57/ 

31/ 

25/ 

0.019355/ 

6.877e-5/ 

652 

119 

73 

42 

31 

0.018434 

6.144e-5 

NL-QGD751 

506/ 

91/ 

54/ 

32/ 

22/ 

0.018682/ 

7.041e-5 

632 

115 

70 

43 

29 

0.017697 

6.114e-5 

NL-QGD771 

512/ 

91/ 

57/ 

31/ 

24/ 

0.019235/ 

6.720e-5/ 

640 

114 

73 

42 

30 

0.018255 

5.862e-5 

The  Lg  normed  NL-QGDs  outperformed  the  L2  normed  NL-QGDs  for  all  3,  5,  7 
nodes  in  ROC.  The  training  performances  of  the  Lg  normed  NL-QGDs  are  shown  to  be 
more  robust  than  the  QGD  and  the  L2  normed  NL-QGD  cases.  The  NL-QGD  with  3 nodes 
shows  better  detection  performance  than  the  cases  with  5,  7 nodes  between  about  200  and 
300  target  detections  in  Figure  56a  and  Figure  57a  but  in  the  other  operating  range,  the 
detection  performances  are  comparable  for  the  3,  5,  7 node  cases. 
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6.3.2.4  Training  and  testing  the  NL-OGD  with  mixed  Lp  norms 

In  order  to  reduce  the  effect  of  non-target  outliers  ton  the  input  space  decision  bound- 
ary formation,  a small  norm  (p  = 1.1)  was  imposed  on  the  non-target  class  and  a large 
norm  (p  = 8)  on  the  target  class.  Table  1 1 shows  the  detection  performance  of  the  Lj  j/Lg 
normed  NL-QGDs  with  3,  5,  7 nodes.  A BP  algorithm  with  momentum  was  used  to  train 
the  Lj  |/Lg  normed  NL-QGDs.  The  algorithm  was  only  modified  at  the  output  layer  for  its 
local  gradient  according  to  (142).  The  weight  adaptation  was  performed  after  presenting 
all  training  exemplars  and  the  training  was  stopped  when  the  cost  in  the  cross  validation 
set  started  increasing. 

In  Figure  58b,  the  false  alarm  curves  for  both  training  and  cross  validation  sets 
decrease  as  the  training  iterations  increase.  The  smaller  norm  (Lj  j)  imposed  on  the  non- 
target class  emphasizes  small  errors  and  deemphasizes  large  errors.  In  Table  1 1,  the  detec- 
tion performance  of  the  Lj  jLg  normed  NL-QGD  outperformed  the  QGD  at  high  probabil- 
ities of  detection  rates  (100%,  99%,  98%,  95%,  92%)  after  the  training. 


Table  11  Detection  performance  of  mixed  norm  NL-QGD  in  training  set. 


Network 

Topologies 

Detection  Rates 

SSE 

** 

Costs 

Stop 

iterations 

100 

% 

99% 

98% 

95% 

92% 

NL-QGD731 

12 

12 

10 

5 

4 

0.00682 

0.4393 

76775 

NL-QGD751 

12 

12 

10 

5 

4 

0.00662 

0.4345 

82400 

NL-QGD771 

12 

12 

10 

5 

4 

0.00664 

0.4358 

79747 

For  NL-QGD  training,  the  learning  rate  (rj)  and  momentum  rate  (oc)  were  0.0001 
and  0.00001  respectively  for  all  the  networks. 
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Learning  and  generalization  curves 


epochs 


Figure  58  Learning  curve  and  False  alarm  curve  for  100%  target  detection  of  the 
Lj  j/Lg  normed  NL-QGD751. 
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ROC  curves 


Figure  59  Training  results  of  the  Lj  j/Lg  normed  NL-QGD751.  a)  ROC  curve,  b)  out- 
puts of  the  Lj  i/Lg  normed  NL-QGD751  for  the  training  set. 
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ROC  curves 


Figure  60  Testing  results  of  the  Lj  j/Lg  normed  NL-QGD75 1 for  the  testing  set  #1.2. 
a)  ROC  curves,  b)  outputs  of  the  L]  j/Lg  normed  NL-QGD75 1 . 
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Figure  61  Testing  results  of  the  Lj  j/Lg  normed  NL-QGD75 1 for  the  testing  set  #2.  a) 
ROC  curves,  b)  outputs  of  the  Lj  j/Lg  normed  NL-QGD751. 
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Table  12  Detection  performance  of  mixed  norm  NL-QGD  in  testing. 


Network 

Topologies 

Detection  Rates 

(No.  of  false  alarms  in  testing  set  #1.2/No. 
of  false  alarms  in  testing  set  #2) 

SSE 

100% 

99% 

98% 

95% 

92% 

NL-QGD731 

232/ 

97/ 

76/ 

40/ 

36/ 

0.00830/ 

292 

124 

101 

63 

45 

0.00795 

NL-QGD751 

252/ 

96/ 

76/ 

48/ 

36/ 

0.00833/ 

316 

123 

101 

63 

45 

0.00800 

NL-QGD771 

251/ 

96/ 

76/ 

48/ 

36/ 

0.00833 

315 

123 

101 

63 

45 

0.00799 

The  performance  of  the  Lj  j/Lg  normed  NL-QGDs  in  the  testing  sets  is  presented  for 
different  network  sizes  in  Table  12.  At  the  98%  ~ 100%  detection  range  in  the  ROC  shown 
in  Figure  60a  and  Figure  61a,  the  Lj  j/Lg  normed  NL-QGDs  improved  the  false  alarm 
rates  compared  to  that  of  the  QGD  and  yielded  the  least  false  alarm  rate  at  100%  detection 
among  the  QGD  and  the  L2  normed  and  Lg  normed  NL-QGDs  for  the  three  different  net- 
work sizes. 

Consequently,  imposing  a large  norm  on  the  target  class  moved  the  decision  boundary 
towards  the  outliers  of  the  target  class  so  that  the  Lj  j/Lg  normed  NL-QGDs  did  not  pro- 
duce large  errors  for  the  target  outliers.  The  detection  thresholds,  therefore,  were  set  to  be 
larger  at  high  probabilities  of  detection  so  that  more  non-target  inputs  could  be  discrimi- 
nated. The  small  errors  from  the  target  inputs  were  more  deemphasized  so  the  detection 
performance  became  worse  below  a high  probability  (about  92%)  of  detection  rate  (Figure 
60a  and  Figure  61a). 
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6.3.2.5  Training  and  Testing  the  NL-OGD  with  Cross  Entropy 

The  cross  entropy  cost  function  was  adapted  as  a network  optimality  index  to  maxi- 
mize the  likelihood  of  observing  the  training  data  set.  The  NL-QGDs  trained  based  on  the 
cross  entropy  put  more  emphasis  on  smaller  errors  and  large  errors  than  when  trained  by 
the  Lp  (p  > 2)  norms. 

The  training  for  the  NL-QGDs  was  stopped  at  the  epoch  where  the  cross  entropy 
function  started  increasing.  In  Figure  62b,  the  false  alarm  curves  for  the  both  training  set 
and  cross  validation  set  decrease  as  the  training  epochs  increase.  As  for  the  Lg  case,  more 
emphasis  was  put  on  large  errors  than  small  errors.  However  the  cross  entropy  function 
also  puts  more  emphasis  on  small  errors  which  are  deemphasized  in  a large  norm  case. 
This  leads  to  improving  the  detection  performance  in  low  probability  detection  ranges. 

The  NL-QGDs  outperformed  the  QGD  and  the  NL-QGDs  trained  Lp  norms  (Table  13 
and  Table  14).  The  ROCs  of  the  NL-QGDs  with  cross  entropy  exhibited  much  more  robust 
detection  capability.  This  is  compared  in  Figure  64a  and  Figure  65a. 


Table  13  Detection  performance  of  NL-QGD  trained  on  cross  entropy  in  training  set. 


Network 

Topologies 

Detection  Rates 

SSE 

Costs** 

Stop 

iterations 

100 

% 

99% 

98% 

95% 

92% 

NL-QGD731 

11 

9 

3 

1 

1 

0.00409 

25.12 

11272 

NL-QGD751 

10 

10 

3 

1 

1 

0.00404 

24.95 

11231 

NL-QGD771 

10 

10 

3 

1 

1 

0.00405 

24.88 

11689 

For  NL-QGD  training,  the  learning  rate  (ri)  and  momentum  rate  (a)  were  0.9  and 
0.7  respectively  for  all  the  networks. 
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Learning  and  generalization  curves 


epochs 


Figure  62  Learning  curve  and  False  alarm  curve  for  100%  target  detection  of  the  NL- 
QGD751S  trained  based  on  the  cross  entropy  function. 
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ROC  curves 


Figure  63  Training  results  of  the  NL-QGDs  with  a eross  entropy  function,  a)  ROC 
curves,  b)  outputs  of  NL-QGD751  for  the  training  set. 
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ROC  curves 


Figure  64  Testing  results  of  the  NL-QGDs  with  the  cross  entropy  function  for  the 
testing  set#1.2.  a)  ROC  curves,  b)  outputs  of  the  NL-QGD751  trained  on  the  cross 
entropy. 
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ROC  curves 
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QGD  (light  line) 
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Figure  65  Testing  results  of  the  NL-QGDs  with  the  cross  entropy  function  for  the 
testing  set#2.  a)  ROC  curves,  b)  outputs  of  the  NL-QGD751  trained  on  the  cross 
entropy. 
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Table  14  Detection  performance  of  NL-QGD  trained  on  cross  entropy  in  testing  set. 


Network 

Topologies 

Detection  Rates 

(No.  of  false  alarms  in  testing  set  #1 .2/No.  of  false 
alarms  in  testing  set  #2) 

SSE 

100% 

99% 

98% 

95% 

92% 

NL-QGD731 

701/ 

103/ 

77/ 

29/ 

17/ 

0.005356/ 

910 

130 

97 

39 

20 

0.004950 

NL-QGD751 

820/ 

102/ 

70/ 

27/ 

17/ 

0.005312/ 

1064 

129 

90 

36 

20 

0.004959 

NL-QGD771 

885/ 

99/ 

71/ 

27/ 

17/ 

0.005309/ 

1152 

125 

91 

36 

20 

0.004941 

6.3.3  Summary  of  Detection  Performance  in  CFAR/OGD  and  CFAR/NL-OGDs 

The  QGD  reduced  the  number  (4455)  of  false  alarms  caused  by  the  two-parameter 
CFAR  detector  to  422  false  alarms,  thus  achieving  a discrimination  power  of  about  1:10 
(422/4455).  The  independent  test  results  from  Lincoln  Laboratory  also  showed  very  prom- 
ising discrimination  using  the  QGD  over  a large  data  base  of  real-life  data. 

The  NL-QGDs  were  trained  and  tested  based  on  different  optimality  indices  (Lp 
norm,  mixed  Lp  norm,  and  cross  entropy).  The  optimality  indices  influenced  the  detection 
performance  of  the  NL-QGDs. 

The  Lp  (p  = 2)  normed  NL-QGDs  produced  smaller  SSE  than  the  QGD  but  yielded 
large  errors  for  outliers.  This  lowered  a threshold  and  therefore  caused  excessive  false 
alarms  for  high  detection  probabilities. 

The  mixed  norm  approach  was  performed  as  an  effort  to  reduce  false  alarm  rates  at 
high  probabilities  of  detection  by  imposing  a larger  norm  (p  > 2)  on  the  target  class  and  a 
smaller  norm  (p  < 2)  on  the  non-target  class.  Penalizing  more  errors  from  the  target  class 
than  the  non-target  class  improved  false  alarms  at  100%  detection  rates.  In  the  training,  the 
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non-target  outliers  produced  large  errors  but  these  large  errors  were  less  penalized  by  a 
smaller  norm.  Training  the  NL-QGDs  based  on  L2  with  removal  of  non-target  outliers  also 
effectively  improved  the  false  alarms  at  100%  detection.  Hence  these  two  training  meth- 
ods effectively  handled  the  non-target  outliers  and  their  detection  performances  were 
superior  to  the  QGD  and  NL-QGDs  trained  on  the  Lg,  L2,  cross  entropy  criterions.  How- 
ever below  100%  detection  rate  these  two  training  methods  (mixed  norm  (Lj  j/Lg)  and  L2 
norm  with  removal  of  non-target  outliers  during  the  training)  deteriorated  the  detection 
performances  of  the  NL-QGDs. 

A large  norm  (p  > 2)  effectively  reduced  large  errors  and  improved  the  false  alarms 
compared  to  the  L2  normed  NL-QGDs.  At  100%  detection,  the  Lg  norm  still  led  the  NL- 
QGDs  to  producing  more  false  alarms  than  the  QGD  but  improved  three  times  the  false 
alarms  by  the  NL-QGDs  trained  on  the  L2  norm.  The  Lg  normed  NL-QGDs  showed  more 
robust  detection  performance  below  100%  detection  rate. 

Contrary  to  the  Lp  norms,  the  cross  entropy  function  was  designed  to  maximize  the 
likelihood  of  observing  the  training  data  set  regardless  of  the  pdf  assumption  of  the  train- 
ing data  set.  For  an  infinitely  large  data  set,  this  indeed  leads  the  network  output  to  produc- 
ing the  a posteriori  probability  given  a class  input  [5].  Since  the  outliers  for  each  class 
have  low  probabilities  (where  the  NL-QGDs  trained  on  the  cross  entropy  yielded  large 
errors),  large  false  alarms  were  produced  at  the  range  between  100%  and  98%  detection. 
However,  the  detection  performance  is  most  robust  below  98%  detection  rate  among  the 
optimality  indices  used. 

As  another  figure  of  merit  for  detection  performance  assessment,  detection  robustness 
is  defined  as  the  area  under  an  ROC  curve.  The  QGDs  and  the  NL-QGDs  trained  on  the  L2 
norm  with  removal  of  non-target  outliers  and  the  mixed  norm  (Lj  j/Lg)  showed  that  their 
detection  performances  dropped  down  rapidly  below  about  90%  detection  rate.  Table  15 
summarizes  the  detection  performance  of  the  QGDs  and  the  NL-QGDs  based  on  the  num- 
ber of  false  alarms  at  high  probabilities  of  detection  (100%,  99%,  98%,  95%,  and  92%). 


Table  15:  Rank  of  Detection  Performance 


160 


* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

•Jf 

* 

PLI 

u 

PQ 

00 

00 

00 

04 

04 

04 

u 

u 

U 

J 

J 

J 

J 

J 

-J 

m 

m 

fOi 

»n 

CO 

<N 

r' 

r^ 

o^ 

Q 

Q 

Q 

Q 

Q 

Q 

Q 

Q 

Q 

a. 

O 

O 

O 

O 

O 

O 

O 

O 

U 

T3 

a 

O 

O 

O 

O 

a 

a 

a 

O 

Q ^ 

j o 

j q 

j q 

j q 

-j  o 

j 

j t--' 

j K 

o ^ 

Z Ci 

z Ci 

Z CJ- 

Z 

Z C- 

z c 

z ^ 

z c 

cx  S 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

tu 

u 

PLI 

00 

00 

00 

04 

04 

04 

u 

u 

U 

J 

J 

J 

J 

-J 

J 

*— H 

m 

in 

r~- 

in 

CO 

»n 

r~ 

r- 

r- 

r-* 

r*- 

r- 

ON 

Q 

Q 

Q 

Q 

Q 

Q 

Q 

Q 

Q 

a. 

a 

O 

O 

O 

O 

O 

a 

U 

O 

■a 

a 

O 

O 

o 

O 

a 

o 

cx 

O 

Q 

j 

j 

-j  o^' 

j b 

j (N 

j b 

j no  ' 

j b 

-j  ifi' 

n in' 

z e 

Z 

Z 

z s 

Z S 

z S 

Z S 

z S 

Z S 

O “S 

* 

* 

* 

* 

* 

* 

* 

C/1 

* 

* 

* 

* 

* 

o 

* 

* 

* 

* 

* 

* 

c3 

00 

00 

00 

fN 

CN 

04 

pp 

pp 

J 

-j 

J 

j 

j 

J 

u 

u 

o 

o 

T^H 

«— 4 

o 

m 

n 

CO 

in 

r- 

CO 

r' 

oo 

r- 

r- 

r- 

r- 

r~- 

r- 

r- 

ON 

Q 

Q 

Q 

Q 

Q 

Q 

Q 

Q 

Q 

Q 

•w 

O 

O 

O 

O 

O 

O 

a 

O 

U 

O 

Q 

cx  _ 

O 

O 

o 

o 

a 

9 

O' 

O 

a 

j o' 

j b 

j 

j b 

j Y' 

J Y' 

j O 

j — ' 

-j 

-J  Y' 

Z 

z 

z ^ 

z ?s 

z ?s 

Z SS 

z 

z e 

z e 

z e 

* 

* 

* 

* 

00 

* 

oo 

oo 

* 

* 

* 

* 

* 

* 

* 

* 

* 

* 

00 

00 

00 

04 

04 

04 

J 

J 

J 

j 

j 

-J 

-J 

J 

J 

O 

V— < 

>n 

in 

CO 

m 

CO 

in 

On 

r~ 

r~ 

r- 

r~ 

r- 

r- 

r- 

ON 

Q 

Q 

Q 

Q 

Q 

Q 

Q 

Q 

Q 

Q 

O 

a 

O 

O 

O 

O 

O 

O 

O 

O 

1 ^ 

o 

r m 

O ^ 

Y ON 

9^  o 

9o 

9o 

0 ^ 

1 CO 

Y fo 

1 ^ 

CX 

Y m 

J 

J 

-J  -- 

-J  <N 

J <N 

J <N 

J <N 

_1  (N 

-J  <N 

-J  (N 

z c 

Z C 

Z C- 

z o 

Z C- 

z c 

z c 

z o 

z o 

Z C 

00 

* 

00 

00 

j 

J 

-J 

00 

00 

00 

J 

o 

J 

o 

-J 

o 

J 

-I 

-J 

•-H 

« 

•M 

o 

m 

r*- 

in 

in 

CO 

in 

r- 

CO 

o 

r' 

r-* 

r' 

r- 

Q 

Q 

Q 

Q 

Q 

Q 

Q 

Q 

Q 

a 

O 

a 

U 

O 

O 

O 

a 

O 

O'  ^ 

1 (N 

O^ 

O' 

1 *n 

0 

1 m 

O ^ 

Y NO 

0 ^ 

1 in 

Q gr 

o ^ 

Y <N 

9o 

O'  ^ 
Y (N 

J ON 

j 

j — 

j 

-J  — 

J ^ 

O 

-J 

J ^ 

— 1 m 

z Ci 

Z C- 

z c 

Z C 

Z C- 

z 

CX  S 

z s 

z s 

Z S 

a 

<N 

in 

NO 

r- 

00 

ON 

o 

0! 

* 

161 


* 

* 

* 

00 

00 

00 

-J 

-J 

_i 

J 

o 

o 

o 

f— ^ 

r~ 

in 

cn 

m 

r- 

»n 

cs 

r*- 

r- 

r~ 

r- 

r- 

o\ 

Q 

Q 

Q 

Q 

Q 

Q 

CO 

a 

O 

O 

O 

O 

a 

Q ^ 

a . 

a 

O 

O 

O 

o 

3: 

j if;}' 

nJ  ^ 

j if2 

j if;}' 

j 

j a 

O'  s 

z s 

S w 

z s 

z s 

z s 

z s 

* 

* 

* 

00 

00 

00 

d 

d 

J 

J 

J 

i 

o 

o 

o 

J 

-J 

J 

sS) 

^iH 

CO 

m 

in 

cn 

m 

o> 

Q 

Q 

Q 

Q 

Q 

Q 

CO 

o 

O 

O 

O 

O 

O 

Q , , 

o 

O 

O 

O 

o 

O 

O 

j oo' 

j 

j 

j <f}' 

j '^' 

j di' 

0!£i- 

Z 0 

z s 

Z 0 

z S 

Z 0 

Z 0 

* 

* 

* 

00 

00 

00 

«/5 

* 

d 

d 

<D 

* 

«-^ 

1-^ 

* 

pj 

J 

U 

u 

J 

-J 

-J 

o 

y-H 

o 

m 

m 

cn 

IT) 

00 

1^ 

o 

o^ 

Q 

Q 

Q 

Q 

Q 

cx 

O 

O 

O 

O 

O 

CO 

■o 

Q 

O 

9o 

9"o 

O^ 

Q 5^ 

Q CN 

j 5:: 

j o 

j O 

j o 

j o 

O O 

U <N 

z 0. 

z c 

z c 

z c 

Z C 

OO 

oo 

* 

* 

* 

* 

* 

* 

* 

PJ 

J 

u 

PJ 

J 

u 

o 

u 

u 

o 

r- 

in 

m 

cn 

o\ 

r~ 

ON 

Q 

Q 

Q 

Q 

Q 

CX 

a 

O 

O 

a 

O 

CO 

TD 

cy  ^ 

\ in 

o ^ 

. VO 

1 ON 

7o 

9o 

9 ^ 

Q ^ 

J <N 

J <N 

J <N 

j 

J 

o 

o ^ 

z o 

Z C- 

z o 

Z C- 

z o 

oc 

oo 

PJ 

PJ 

PJ 

CN 

CN 

CN 

u 

u 

u 

J 

_j 

J 

1-M 

R 

cn 

m 

m 

m 

^ ■ 

r- 

r~ 

r- 

r-*- 

r" 

& 

Q 

Q 

Q 

Q 

Q 

Q 

*o 

O 

O ^ 

O ^ 

O ^ 

a ^ 

O ^ 

Q 

O 'J 

1 VO 

o <N 
j m 

0 o 

1 oo 

O OV 
1 00 

O' 

T Ov 

^ V0 

J :r 

J O 

j -- 

hJ  00 

hJ  00 

J 00 

o s 

z 0 

Z C 

z c 

Z O 

z o 

Z C- 

i 

CN 

m 

in 

VO 

r- 

d 

/ 

162 


Table  16  compares  the  detection  robustness  among  the  detectors.  Note  that  the  detec- 
tors marked  by  the  same  number  of  asterisk  (‘*’)  in  each  column  have  the  same  detection 
performances  (the  same  numbers  of  false  alarms). 


Table  16  Robustness  Rank  of  detection  performances  of  the  detectors 

(QGD  and  NL-QGDs). 


Robustness 

Rank 

Detectors 

1 

NL-QGDCE 

2 

NL-QGDl8 

3 

NL-QGDl2 

4 

QGDjjjjgpj 

5 

QGDls 

6 

NL-QGDol 

7 

NL-QGDli  i/l8 

6.3.4  vGFAR/OGD  and  tCFAR/NL-OGDs 

In  Section  6.2,  the  two  prescreeners,  the  two-parameter  CFAR  and  yCFAR  detectors, 
were  run  over  the  Mission  90  Pass  5 SAR  data  set  and  their  detection  performances  were 
compared.  In  Section  6.3.1,  the  QGD  and  the  NL-QGDs  were  trained  and  tested  as  false 
alarm  reducers  in  the  false  alarm  reduction  stage  after  the  two-parameter  CFAR  detector. 
In  this  section,  the  QGD  and  NL-QGDs  trained  in  Section  6.3.1  and  Section  6.3.2  are 
applied  to  the  ROIs  selected  by  the  yCFAR  detector.  Retraining  the  QGD  and  the  NL- 
QGD  is  not  essential  in  conjunction  with  the  yCFAR  detector  because  many  detection 
points  (after  clustering)  caused  by  the  yCFAR  detector  overlap  with  those  by  the  two- 
parameter  CFAR  detector  (Figure  38). 

The  detection  performances  of  the  QGD  and  NL-QGDs  at  high  probabilities  of  detec- 
tion (100%  ~ 92%)  are  tabulated  in  Table  17.  In  general,  the  NL-QGDs  trained  on  the  L3 
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and  Lj  j/Lg  norms  showed  the  best  detection  performances  for  different  sizes  of  the  net- 
work (3,  5,  7 nodes)  at  100%  detection  rate.  This  implies  that  imposing  a large  norm  on 
the  target  class  increased  a 100%  detection  threshold  so  that  an  excessive  amount  of  false 
alarms  was  prevented  from  being  produced  due  to  a low  threshold.  The  NL-QGDs  trained 
with  removal  of  non-target  outliers  showed  the  second  best  detection  performance  and 
were  followed  by  the  QGD  trained  with  the  LS  and  a gradient  descent  method  in  which 
334  and  399  false  alarms  were  produced  respectively  at  100%  detection  rate.  The  cross 
entropy  criterion  and  the  L2  norm  led  the  NL-QGDs  to  yielding  relatively  large  false 
alarms  at  100%  detection  rate. 

The  NL-QGDs  trained  with  removal  of  non-target  outliers  still  showed  a good  dis- 
crimination ability  at  99%  detection  rate  while  their  detection  performances  were 
degraded  in  conjunction  with  the  two-parameter  CFAR  detector.  The  QGD^s  showed  the 
second  best  detection  performance  at  99%  detection  rate.  The  Lj  j/Lg  norm,  Lg  norm  and 
cross  entropy  function  led  the  NL-QGDs  to  producing  comparable  detection  performance 
with  the  QGD^dapt  at  99%  detection  rate.  The  NL-QGDs  trained  on  the  L2  norm  produced 
the  most  false  alarms  at  99%  detection  rate.  At  98%  detection  rate,  the  NL-QGDsLg  norm 
produced  the  best  detection  performance  followed  by  the  NL-QGDs^g,  NL-QGDsl2’ 
QGDsql,  QGDslS’  NL-QGDsli  i/Lg,  and  QGDs  adapt-  Below  95%  detection  rate,  the 
NL-QGDS(^£  and  NL-QGDs£g  produced  low  false  alarms  compared  to  the  other  norms. 
This  is  in  agree  with  the  case  of  the  QGD  and  NL-QGDs  in  conjunction  with  the  two- 
parameter  CFAR  detector. 

6.3.5  Fast  Implementation  of  yCFAR/QGD  and  vCFAR/NL-QGDs 

Currently,  the  implementation  of  the  yCFAR  and  QGD  requires  two  different  optimal 
sets  of  \iffi  and  p„.  The  yCFAR  and  QGD  find  their  optimal  and  p„  values  at  different 
locations  in  the  parameter  space  for  their  best  target  discriminations  against  clutter.  This  is 
because  the  yCFAR  finds  its  optimal  and  p„  with  a restricted  discriminant  function 
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while  the  QGD  seeks  its  optimal  and  with  8 degrees  of  freedom  in  the  weights  for 
the  discriminant  function.  Another  reason  for  the  dissimilarity  in  the  two  optimal  sets  of 
and  p„  is  from  the  fact  that  for  the  7CFAR  detector  the  highest  pixel  intensities  of  the 
training  image  chips  were  placed  at  the  centers  of  the  chips,  thus  resulting  in  a peaky 


Table  17  Detection  performance  of  the  QGD  and  NL-QGDs  in  conjunction  with  the 
yCFAR  detector  (yCFAR/QGD  and  tCFAR/NL-QGDs). 


Network 

Topologies 

Detection  Rates 

SSE 

100% 

99% 

98% 

95% 

92% 

QGDls 

334 

100 

94 

74 

52 

0.06070 

QGDadapt 

399 

154 

108 

87 

73 

0.04580 

NL-QGD731l2 

677 

239 

90 

47 

38 

0.02912 

NL-QGD751l2 

678 

239 

90 

47 

38 

0.02911 

NL-QGD771l2 

677 

239 

90 

47 

38 

0.02910 

NL-QGD731ql 

237 

98 

93 

76 

57 

0.05171 

NL-QGD751ql 

207 

102 

92 

79 

68 

0.07480 

NL-QGD771ol 

204 

98 

93 

78 

63 

0.07212 

NL-QGD731l8 

177 

147 

72 

45 

24 

0.04623 

NL-QGD751l8 

160 

148 

71 

41 

23 

0.04586 

NL-QGD771l8 

167 

148 

73 

43 

23 

0.04639 

NL-QGD731li.i/l8 

176 

146 

96 

75 

63 

0.03937 

NL-QGD751lu/l8 

176 

146 

95 

75 

62 

0.03987 

NL-QGD771li.i/l8 

173 

145 

95 

75 

62 

0.03977 

NL-QGD731ce 

451 

145 

88 

28 

23 

0.02384 

NL-QGD751ce 

431 

152 

86 

27 

22 

0.02401 

NL-QGD771ce 

434 

182 

86 

27 

22 

0.02404 

kernel  (m  = 1)  while  for  the  QGD  the  center  points  of  the  training  image  chips  are  cen- 
troids of  detection  points  in  the  image  so  that  the  highest  pixel  intensities  are  no  longer 
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placed  at  the  chip  centers.  This  led  kernel  (m  = 1)  of  the  QGD  to  stretching  out  to  cap- 
ture high  pixel  values  under  the  kernel,  thus  resulting  in  to  be  broader  compared  to  that 
of  the  tCFAR  detector. 

Focus  of  attention  for  ATD/R  requires  fast  implementation  of  processing  the  input 
data.  If  the  QGD  uses  the  same  and  p„  values  with  the  ^CFAR  detector  the  computa- 
tional bandwidth  of  the  QGD  can  be  reduced  because  5 features  of  • X) , {g^*  X), 
(8n  • (Sn  • (Snt  * ^8n  * required  to  be  computed.  This 

expedites  the  operation  speed  of  the  focus  of  attention  stage  in  the  multi-stage  ATD/R  sys- 
tem. However,  the  usage  of  the  ^CFAR’s  and  p„  will  degrade  the  detection  perfor- 
mance of  the  QGD.  This  is  shown  in  Figure  39a.  With  the  tCFAR’s  optimal  and  p„  (pj 
index  = 22  and  pj5  index  = 15),  the  QGD  produced  280  false  alarms  (5  times  the  number 
of  false  alarms  with  the  QGD’s  optimal  pj  and  pj5  values).  A way  of  obtaining  a reason- 
able detection  performance  of  the  QGD  at  the  yCFAR’s  optimal  P]  and  pj5  is  to  train  the 
QGD  with  training  image  chips  in  which  the  highest  intensities  are  placed  at  the  chip  cen- 
ters. This  is  because  the  QGD  may  capture  its  discrimination  ability  with  a peakier  ker- 
nel in  the  parameter  space.  Figure  66  shows  the  QGD  false  alarm  surface  with  the  same 
training  image  chips  used  in  Figure  39a  but  the  highest  pixel  intensities  at  the  chip  centers. 

The  minimum  number  of  false  alarms  was  38  and  occurred  at  pi  index  17  and  P15 
index  14  in  the  parameter  space.  In  this  false  alarm  surface,  the  number  of  false  alarms 
was  76  at  the  tCFAR’s  optimal  pj  and  pj5.  From  this  training  set,  the  QGD  produced  3.5 
times  less  false  alarms  than  the  QGD  with  the  training  set  used  in  Figure  39a. 

In  conclusion,  the  same  usage  of  pj  and  P15  for  both  the  tCFAR  detector  and  the 
QGD  expedites  the  computation  speed  in  the  focus  of  attention  stage  and  does  not  require 
an  exhaustive  search  of  the  parameter  space  so  that  an  optimal  weight  set  of  the  QGD  can 
be  obtained  by  the  LS  solution  given  a predetermined  set  of  pj  and  P15  values.  This  is 
indeed  a big  advantage  in  the  QGD  training  time  and  in  a real-time  operation  of  the  multi- 
stage ATD/R  system. 
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Figure  66  False  alarm  surface  versus  iij  and  }ij5.  Note  that  in  this  training  set  the 
image  chips  have  the  highest  pixel  intensities  at  the  chip  centers. 


CHAPTER  7 


CONCLUSIONS  AND  FURTURE  RESEARCH 
7.1  Summary 

In  this  study,  a novel  approach  to  the  design  of  a focus  of  attention  stage  was  proposed 
based  on  gamma  kernels  for  a multi-stage  ATD/R  system.  The  focus  of  attention  stage  is 
composed  of  two  substages;  a front-end  detection  stage  and  a false  alarm  reduction  stage. 

The  front-end  detection  stage  employs  a prescreener  to  nominate  potential  target  loca- 
tions in  the  image.  Here  we  discussed  a common  prescreener,  the  two-parameter  CFAR 
detector  and  extended  it  to  the  ^yCFAR  detector.  The  two-parameter  CFAR  detector  mea- 
sures a target  mean  by  a target  masking  kernel  and  clutter  statistics  of  mean  and  variance 
by  clutter  masking  kernel  in  the  CFAR  stencil.  The  yCFAR  detector  incorporates  a set  of 
2-D  gamma  kernels  into  the  yCFAR  stencil  which  relaxes  a constraint  of  the  fixed  size  of 
the  CFAR  stencil.  A first  order  gamma  kernel  and  a higher  order  gamma  kernel  (n  > 1 ) 
constitutes  the  yCFAR  stencil  in  which  a target  mean  is  computed  by  the  first  order  gamma 
kernel  and  clutter  statistics  of  mean  and  variance  are  estimated  by  the  higher  order  gamma 
kernel. 

The  2-D  gamma  kernels  were  extended  by  a circularly  symmetric  rotation  from  their 
1-D  counterparts.  The  2-D  gamma  kernels  preserve  the  1-D  characteristic  of  memory 
depth  in  the  2-D  domain.  The  memory  depth  is  controlled  by  the  kernel  order  n and  the 
parameter  p which  determine  the  shape  and  the  scale  of  the  yCFAR  stencil.  Thus  the 
yCFAR  stencil  can  be  adaptively  set  to  better  optimize  a figure  of  merit  (false  alarm  rates, 
detector  output  errors,  etc.). 

The  two-parameter  CFAR  detector  and  the  yCFAR  detector  were  interpreted  in  a sig- 
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nal  processing  perspective.  In  the  CFAR  stencil,  a local  region  of  image  is  decomposed  by 
two  local  projection  operators  (a  target  masking  kernel  and  clutter  masking  kernel),  which 
may  be  suboptimal  due  to  a fixed  size  of  window.  However,  the  yCFAR  stencil  locally 
projects  a region  of  image  under  analysis  onto  a gamma  basis  by  two  gamma  kernels  (one 
having  n = 1 and  the  order  having  an  order  (n>  1)).  The  parameter  p provides  a degree  of 
freedom  to  rotate  the  gamma  basis  so  that  a figure  of  merit  could  be  better  optimized. 

From  the  detection  performance  comparison  in  the  mission  90  pass  5 SAR  data  set, 
the  yCFAR  detector  outperformed  the  two-parameter  CFAR  detector  in  each  frame  and 
yielded  760  false  alarms  while  the  two-parameter  CFAR  detector  produced  4455  false 
alarms  (1:6  ratio).  Overall,  the  ROC  of  the  yCFAR  detector  exhibited  more  robust  detec- 
tion performance  than  the  two-parameter  CFAR  detector. 

After  prescreening  the  entire  image  in  the  front-end  detection  stage,  a more  sophisti- 
cate algorithm  is  applied  to  the  locations  nominated  by  the  prescreener  to  further  discrimi- 
nate objects  of  interest  against  clutter  (false  positive),  in  the  false  alarm  reduction. 

The  two-parameter  CFAR  detector  was  interpreted  in  terms  of  pattern  recognition. 
The  two-parameter  CFAR  detector  implements  a restricted  linear  discriminant  function  of 
quadratic  terms  of  image  intensities.  From  this  perspective,  the  two-parameter  CFAR 
detector  could  be  improved:  ( 1 ) it  uses  only  some  of  quadratic  terms  of  image  intensity  on 
a pixel  and  its  surrounds;  (2)  it  implements  a fixed  parametric  combination  of  these  fea- 
tures; (3)  there  is  little  flexibility  in  the  feature  extraction  because  the  target  masking  and 
the  clutter  masking  in  the  stencil  are  ad-hoc. 

The  QGD,  an  extended  form  of  the  two-parameter  CFAR  detector,  improved  these 
three  aspects  by  exploiting  all  quadratic  and  linear  terms  of  two  image  intensities  (target 
mean  intensity  and  clutter  mean  intensity)  extracted  in  the  7CFAR  stencil  and  then  by  con- 
structing a linear  discriminant  function  based  on  the  features.  The  optimal  weights  are 
computed  in  a closed  form  (LS  solution)  through  an  exhaustive  search  of  the  parameter 
space  but  can  be  adaptively  found  in  the  parameter  and  weight  space  in  a iterative  manner 
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by  a gradient  descent  method. 

Being  compared  to  the  two-parameter  CFAR  detector,  the  QGD  generalizes  it  with 
respect  to:  (1)  the  shape  of  the  kernels  used  for  estimating  the  mean  and  variance;  (2)  the 
selection  of  the  weights  of  the  decision  function  which  is  not  a priori  but  are  found 
through  optimization;  (3)  utilizing  all  quadratic  and  linear  terms  of  the  two  intensity  fea- 
tures (target  mean  and  clutter  mean). 

So  our  conclusion  is  that  not  only  the  "yCFAR  detector  but  also  the  QGD  can  be  viable 
alternatives  to  the  two-parameter  CFAR  detector:  The  yCFAR  detector  can  replace  the 
two-parameter  CFAR  detector  as  a prescreener.  Notice  that  the  raw  features  that  the  QGD 
requires  are  a superset  of  the  ones  used  for  the  yCFAR  detector.  So  with  one  preprocessor 
(QGD),  we  can  implement  both  a detector  and  a false  alarm  reducer  (or  a discriminator). 

The  NL-QGD  extended  the  linear  structure  of  the  QGD  into  a nonlinear  structure  as  a 
MLP.  The  NL-QGD  has  a potential  to  improve  the  detection  performance  since  the  MLP 
has  capable  of  creating  arbitrary  discriminant  functions.  Both  the  QGD  and  the  NL-QGD 
were  tested  and  compared  based  on  detection  performance  in  ROC.  The  QGD  reduced 
4455  false  alarms  triggered  by  the  two-parameter  CFAR  detector  to  422  false  alarms  at 
100%  detection  rate,  achieving  a discrimination  power  (1:10  ratio).  The  NL-QGDs  trained 
on  L2  norm  outperformed  the  QGD  in  most  range  of  ROC  for  different  network  sizes  (3,  5, 
7 nodes)  but  produced  excessive  false  alarms  at  100%  detection  rate  due  to  the  nature  of  a 
nonlinear  system. 

Sinee  the  NL-QGD  is  designated  to  operate  on  high  probabilities  of  detection,  the 
effort  has  been  made  on  improving  the  false  alarms  at  high  probability  rates.  By  incorpo- 
rating a bigber  norm  (p  > 2)  into  the  NL-QGD  cost  function,  the  NL-QGD  trained  on  L3 
produced  665,  632,  and  640  false  alarms  at  100%  detection  rate.  These  false  alarms  still 
outnumber  the  false  alarms  caused  by  the  QGD  but  the  L3  normed  NL-QGDs  greatly 
improved  the  performance  below  100%  detection  rate.  The  detection  performance  of  the 
NL-QGD  with  mixed  norms  was  improved  at  100%  detection  rate  (292,  316,  315  false 
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alarms  for  3,  5,  7 nodes)  and  became  inferior  but  comparable  to  the  QGD  below  98% 
detection  rate.  The  cross  entropy  function  was  finally  used  as  an  optimal  index  for  the  NL- 
QGD.  The  NL-QGDs  trained  on  the  cross  entropy  function  showed  more  robust  perfor- 
mance compared  to  the  QGD,  the  L2  normed  NL-QGDs,  the  Lg  normed  NL-QGDs,  and 
the  NL-QGD  with  the  mixed  norms  (Lj  1 for  non-target  class  and  Lg  for  target  class).  With 
about  30%  loss  of  targets,  they  were  able  to  completely  reject  false  alarms. 

7.2  Future  works 

The  yCFAR  stencil  relaxed  the  constraint  of  a fixed  size  of  the  CFAR  stencil.  Even  the 
7CFAR  stencil  can  be  adaptively  set  to  optimize  a figure  of  merit  the  kernel  orders  were  a 
priori  determined  to  cover  a full  scattering  range  of  known  targets. 

However,  multi-scale  objection  detection  may  face  the  range  uncertainties  of  different 
targets  and  the  differences  between  the  size  shapes  at  target  types  projected  over  all 
aspects  and  elevations.  In  order  to  solve  the  expected  problems,  it  may  be  necessary  to  uti- 
lize a full  set  of  gamma  kernels  to  cover  the  ranges  at  possible  situations.  Potential  targets 
may  be  declared  by  the  highest  scores  measured  over  the  entire  set  of  gamma  kernels. 

The  use  of  the  full  gamma  kernel  size  for  image  analysis  is  left  for  further  research. 
Also,  the  determination  of  the  optimal  order  to  estimate  the  background  mean  and  vari- 
ance has  not  been  addressed  yet,  since  there  is  a quasi-symmetry  between  a change  in 
order  and  a change  in  |i. 

The  projection  on  the  circularly  symmetric  2-D  gamma  kernels  are  complete  so  that  a 
input  image  can  be  recovered  from  the  projection.  Extending  1-D  gamma  kernels  to  2-D 
gamma  kernels  on  which  the  projection  could  be  complete  will  allow  for  image  recon- 
struction from  a projection.  The  image  analysis  can  be  accomplished  by  the  2-D  gamma 
kernels  which  can  be  pdfs. 


APPENDIX 


POLARIZATION  BASIS  TRANSFORMATION 

A.  1 Representation  of  a Plane  Wave  Polarization 

A plane  wave  E,  propagating  in  the  +z-direction  can  be  express  as 

E = {E^u^  + EyUy)e-'^^  (153) 

In  (153),  both  Ex  and  Ey  are  complex  and  can  be  written  for  a time- varying  plane  wave  as 

£,  = £„e''“,  (154) 

where  e is  the  ralative  phase  difference  between  Ex  and  Ey,  both  of  which  are  travelling  in 
the  -i-z-direction.  The  time-varying  plane  wave  IE  is  expressed  as 

IE  = Re{E} 

= Eg^cos  (wt-kz)Ux  + E^yCos  (wt-  kz-e)Uy  (155) 

= W,u^  + Wyiiy 

where 

W-x  = Eoj^cos{wt  - kz)  (156) 

^y  = E^yCOs{wt-kz-e) . (157) 

Expanding  the  expression  for  Ey  into 

W-y/Egy  - cos  {wt  - kz)  cos  {z)  - s\n{wt  - kz)  sin  (e) 
and  combining  it  with  yields 

IE  IE 

- -^7^  cos  (e)  = -sin  (wt  - A:z)  sin  (e)  (158) 
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It  follows  from  ( 1 56)  that 


so  (158) leads  to 


sin  (wt-kz)  = 


y^oxj 


1/2 


^-^cos(e) 

\ oy  ox 


2- 

— 

1 - 



J 

sin  (e) 


By  rearranging  the  above  expression,  we  have 


fW.  fW.  (W.  \ 

^ _2  _y_ 

K^oy)  y^ox)  \^oy) 


fW.  \ 

K^oxJ 


cos  (e)  = (sin(e)) 


(159) 


This  is  the  equation  of  an  ellipse  making  an  anglea  with  the  {u^,  «^)-coordinate  system 
such  that 


tan  (2a)  - 


^^ox^oy^^^  (e) 


(160) 


and  the  wave  (153)  is  then  said  to  be  elliptically  polarized.  The  ellipse  of  (159)  is  graphi- 
cally described  in  Figure  67. 


Figure  67  Graphical  description  of  an  elliptically  polarized  wave. 
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A particular  wave  can  be  stated  in  terms  of  its  specific  state  of  polarization.  The  polariza- 
tion state  includes  linear  polarization,  and  right-  or  left-circular  polarization.  So  the  condi- 
tion of  elliptic  polarization  corresponds  to  a polarization  state.  The  polarization  state  is 
defined  by  the  following  parameters 
• Linear  polarization 

e = 2nm  where  m = 0, +1,±2,  ... 


E = E^^cos{wt-kz)Ux  + E^yCOs(wt-kz)Uy 

Right-cireular  polarization 
n 

E = --  + 2nm  where  m = 0, +1, ±2,  ... 


(161) 


^ox  ~ ^oy  ~ 


E = Eg{cos  {wt  - kz)  u^+ sin  (wt  - kz)  Uy) 


(162) 


• Left-cireular  polarization 
71 

e = ~ + 2nm  where  m = 0,  ±1,±2,  ... 

E = E = E 

OX  oy  o 

E - E^  (cos  (wt  - kz)  u^- sin  (wt  - kz)  Uy)  (163) 

A linearly  polarized  wave  can  be  synthesized  from  two  oppositely  polarized  circular 
waves  of  equal  amplitude.  In  particular,  if  we  add  the  right-circular  wave  of  (162)  to  the 
left-circular  wave  of  (163),  we  get 


E = 2 E ^cos  {wt  — kz)  Ujy  (164) 

which  has  a constant  amplitude  vector  of  2EJi^  and  is  therefore  linearly  polarized. 


A. 2 Circular-to-Linear  and  Linear-to-Circular  Polarization  Basis  Transformations 

If  we  use  the  x-  and  y-axis  as  a linear  polarization  basis  for  the  horizontally  (H)  and 
vertically  (V)  time-varying  components  of  a plane  wave  respectively  and  write  the  equa- 
tion of  a electric  field  {E)  propagating  in  the  -f-z-direction  at  the  transimt  antenna,  we  get 
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the  following  expression  by  [51] 


Uh 

hXr 

U y 

(165) 


where  is  the  intrinsic  imepdence  of  free  space,  I the  current  into  the  transmit  antenna 
terminals,  X the  wave  length  of  the  transmitted  electric  field,  and  r is  a distance  from  the 
transmit  antenna.  If  it  is  assumed  that  the  fully  polarimetric  scattering  matrix  (5  ) of  a 

target  for  a linear  polarized  transmit/receive  antenna  configuration  is  known,  then  we  can 
write  the  reflected  wave  at  the  receive  antenna  as 


F'' 

1 

^HV 

e‘ 

F’' 

J^r 

SvH 

Syv 

F‘ 

\^v\ 

(166) 


where 


= 


^HH  ^HV 
SvH  ^VV 


(167) 


The  reflective  wave  at  the  receive  antenna  can  also  be  written  with  the  fully  polarimet- 

(C) 

ric  circular  scattering  matrix  (S  ) of  the  target  in  left  and  right  circular  component  form 
which  which  leads  to  the  following  expression 


F'' 

1 

^RR 

^RL 

E' 

F'' 

J^r 

^LR 

Sll 

e‘ 

(168) 


where 


^(C)  ^ 


^RR  ^RL 
^LR  ^LL 


(169) 


The  electric  field  {E)  can  be  expressed  in  the  circular  polarization  basis  in  right  and 


left  circular  component  form  as 
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E = Ef^u  fj  + Eyii 
= 


(170) 


where  Ej^  and  indicate  the  complex  right  and  left  circular  field  components  respec- 
tively. According  to(161),(l  62)  and  ( 1 62),  the  circular  basis  ( d)^  and  ) are  related  to 
the  linear  basis  by 


6^  = Uff+jUy 


or 


’ 1 

-j 

1 

j . 

(171) 


(172) 


By  substituting  (171)  into  (170),  The  right-  and  left-circular  components  of  the  wave  are 
expressed  in  terms  of  the  horizontal  and  vertical  components 


Er  - 2^^ H ’^JEy) 
1 (E„-jE0 


(173) 


We  substitute  (173)  into  (168)  for  both  the  transmitted  and  received  wave  but  due  to 
the  direction  change  in  the  propagation  and  the  right-handed  coordinate  system  after 
reflection  against  the  target,  we  have  the  coordinate  system  (-«2>  '''ilh  signs 

changed  accordingly.  So  we  replace  Ey  by  -Ey  for  the  reflective  wave  in  (168). 


E^h-JE'v 

1 

^RR 

^RL 

E’h+JE'v 

E'H+jE'y 

1 

^LR 

Sllj 

E'^-jE'y 

Rearranging  (174)  yields  the  following  equation 


(174) 
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F'^ 
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\^V] 

J4nr 

2^  ^^RR  ^RL  ~ ^LR  ~ ^Ll)  2 ^~^RR  ^RL  ^ LR  ~ ^Ll) 


(175) 
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By  equating  (175)  and  (166),  we  obtain  the  following  relationship  between  the  linear 
and  circular  scattering  matrix  elements  of  the  target. 

^HH  - 2 ^RL  ^rO 

^HV  ~ 2^  ~ ^RL  ^LR  ~ ^ Ll) 

(176) 

~ 2^  ^^RR  ^RL  ~ ^LR  ~ 

Syv  - 2 ^~^RR  ^RL  ^LR  ~ ^lD 

^RR  ~ 2 ~j^HV  ~j^VH  ~ 

^RL  ~ 2 

J (177) 

^LR  ~ 2 ^ 

^LL  = \ ~ “^vv) 
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